Firebird TPC Benchmarking Part 1 - Porting the TPC-H suite
Friday, December 5, 2008, 02:10 PM -
,
,
Posted by Administrator
I had a look on what is needed to port the decision support system (DSS) based TPC benchmark
TPC-H to Firebird. The idea is to have a standardized toolset based on the official TPC-H test suite for Firebird available to run tests against a Firebird database typical for a DSS / OLAP environment. I think, this test suite hopefully can produce useful measures which can be used to, e.g.:
* compare Firebird running in different hardware environments (I/O storage, RAM, number of CPUs / Cores, ...)
* compare raw performance without any physical model enhancements like primary key constraints, indices, ... with an optimized physical model
* see how well the optimizer is doing in case of existing indices
* compare different page buffer values
* compare different page sizes
* possibly get results from Firebird users all over the world
From current DWH projects, we see, that Open Source on the server-side is getting more and more important. So getting some concrete performance results from customers running on different hardware is something I wanted to do for quite some time now.
Porting the TPC-H test suite to Firebird is looking promising. This is what I have done so far:
* Downloaded the
TPC-H specification* Downloaded the
TPC-H toolset* Installed
Cygwin on my desktop PC
* Built the TPC-H tools
dbgen and
qgen in Cygwin
* Created an empty Firebird 2.1 database and executed
dss.ddl to create the tables
* Ran
./dbgen -v -s 1 in Cygwin to create data for a "scale 1" (~ 1GB; scale 10 => ~ 10GB ...) database in
*.tbl files (one file for each table)
* Implemented an ETL-Job with the Open Source ETL product
Kettle to fill my database with data based on the *.tbl files => the filled Firebird database is ~ 1.5 GB in size
* Made the SQL query templates compatible with the Firebird 2.1 syntax
* Set the DSS_QUERY environment variable to the directory with the adjusted query templates
* Ran
./qgen > ./queries/all_dss_queries.sql in Cygwin which creates a SQL file with all 22 test queries that are part of the TPC-H test suite
* Ran the first query to see whether the result is the same as the one documented in the specification. The query looked like:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= dateadd(day, -90, date '1998-12-01')
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Execution time ~ 63 seconds.
I'm almost done with the TPC-H port. I know have to adjust the query file "all_dss_queries.sql" a bit, so that it can be executed with isql or
fsql without troubles (e.g. with execution plan and timing enabled, ...). The author of fsql, Ivan Prenosil, probably will add a SET ROWCOUNT clause to his tool, so that the SQL script can be run without the need to add the FIRST clause to the SQL statements. Some of them shall return only a particular number of records, as defined in the TPC-H specification.
After that, I will run the script on my Firebird 2.1 database (Classic Server) under Windows 2003 Server 64bit to get a raw performance measure (without indices, ...). Then I will create the constraints and indices as permitted in the specification to see, if it goes faster (I guess it will *g*). Then I can give different page buffer value a try, different page sizes, or whatever. Another thing would be to create a scale 100 database with ~ 100GB of size, but this will take some time. ;-)
If you are interested in getting the TPC-H port for Firebird including SQL scripts, ETL transformation files, ..., then let me know. The only documentation on that port is this blog so far. Us usual, documentation ...
Thomas