Gratifying News

October 6th, 2006


I’m much relieved to learn that my sole television news source, The Daily Show, is apparantly as substantive as any of the regular news networks whose coverage I find to be eye-rollingly dull, irrelevant, laden with sound-bites and photo-ops, and just plain stupid.

If you’re not from these shores then here’s a “flavor“.

Asking for too much

October 5th, 2006

A little light relief from parallelism — a question on the forums at DBASupport.com has captured my interest. One of the clearest examples observed in the wild of a person without initiative, possibly taking some course for which SQL is required but not of interest to the attendee.

Is it beyond reason to ask that someone just look up a simple example of what they’re trying to do and compare their own efforts? Apparantly so.

Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part II

October 2nd, 2006

Here are some preliminary test results for a single disk query.

The disk is a Seagate Cheetah ST318404LC, of 18.37Gb and 10,000 rpm. Manufacturer’s read seek times are 5.2ms (average), 0.6ms (single track), 10ms (max full seek).

A single tablespace was created on the disk, with a single table create and populated as in this script.

create tablespace sandbox_one_disk
datafile '/opt/d1/sandbox__one_disk.dbf' size 4g
extent management local uniform size 256M
segment space management manual
/

create table t2
   (col1 varchar2(1000))
tablespace sandbox_one_disk
pctfree 99 pctused 0
noparallel
nologging
nocompress
nomonitoring
storage (minextents 8)
/
insert /*+ append */ into t2
select lpad(rownum,100)
from   dual
connect by level <= 250000
/

begin
   dbms_stats.gather_table_stats
      (ownname          => user,
       tabname          => 'T2',
       estimate_percent => 1,
       block_sample     => true,
       method_opt       => 'for all columns size 1');
end;
/

This gave a table size of just less than 2Gb to be queried as follows:

set timing on echo on heading off
select /*+ noparallel(t2) */ count(*) from t2;
select /*+ parallel(t2 2) */ count(*) from t2;
select /*+ parallel(t2 3) */ count(*) from t2;
select /*+ parallel(t2 4) */ count(*) from t2;
select /*+ parallel(t2 5) */ count(*) from t2;
select /*+ parallel(t2 6) */ count(*) from t2;
select /*+ parallel(t2 7) */ count(*) from t2;
select /*+ parallel(t2 8)  */ count(*) from t2;
select /*+ parallel(t2 20) */ count(*) from t2;
select /*+ parallel(t2 40) */ count(*) from t2;
select /*+ parallel(t2 80) */ count(*) from t2;

The wall clock time for each query was noted and graphed.

Each test run was executed with the server rebooted and reconfigured to use a different i/o scheduler each time. The scheduler in use was verified using:

dmesg | grep scheduler

All four available schedulers were tested with the following results:

In the above results a parallelism of “1″ represents nonparallel query without direct serial reads being used.

I shall pause here for comments … :)

Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part I

September 29th, 2006

Quite some time ago I whined about the issue of getting excessive head movement on long sequential reads of the type that data warehouses commonly use. The problem, to paraphrase my earlier article, was that during a read of a large amount of contiguous data the operating system/RDBMS would be too amenable to satisfying other read requests from the same disk, hence incurring head movement too frequently.

This issue popped back into my head after being directed through Log Buffer #11 at Mark Rittman’s site to an article by Curt Monash titled “Is data warehousing all about sequential access?” and which matched my thoughts very well.

I wondered whether the miracles of open source software had anything to offer that might help so I asked the reasonably eclectic crowd at Joel On Software for help — after all, even Google doesn’t help when you don’t know what question you’re asking!

So now that I knew what question to ask it was relatively simple to look for information on Linux 2.6 i/o scheduling, and I found some very interesting resources (IMO).

The 2.6 kernel introduced four types of i/o scheduling.

  • Completely Fair Queueing
  • Deadline
  • NOOP
  • Anticipatory

Have a read about them here. Now also in that article it makes the following unsupported assertion: “.. the Deadline scheduler out-performed CFQ for large sequential read-mostly DSS queries”. That made me a little suspicious, and here’s why.

When Oracle is performing a full table scan using parallel query it is continually issuing read requests of around 1Mb (for example) for a large set of blocks that are contiguous. Hence there ought to be little or no latency due to disk head movement. When another parallel query slave, possibly for the very same query as the first, is also trying to retrieve a large set of contiguous data the danger is that the disk head will continually be flicking around between the two processes, incurring latency each time it does so. The most efficient scheduling method would therefore appear to me to be one that allows the second process to wait while satisfying more requests from the first process, thus reducing the disk head movement and increasing the rate of blocks read from disk.

With that in mind, consider this description of the deadline scheduler: “The deadline scheduler implements request merging … and in addition tries to prevent request starvation (which can be caused by the elevator algorithm) by introducing a deadline for each request. In addition, it prefers readers. Linux does not delay write processes until their data are really written to the disk, because it caches the data, while readers have to wait until their data are available.” Doesn’t sound to me like to would help.

Here is the description of the anticipatory scheduler from the same reference and with my own emphasis: “The anticipatory scheduler implements request merging … and in addition optimizes for physical disks by avoiding too many head movements. It tries to solve situations where order cialis order viagra online purchase cialis buy viagra order viagra buy cialis order levitra order cialis cheap cialis buy viagra many write requests are interrupted by a few read requests. After a read operation it waits for a certain time frame for another read and doesn’t switch back immediately to write requests. This scheduler is not intended to use for storage servers!“.

The anticipatory scheduler (OK, or “elevator”) introduces a very small delay, in the order of one millisecond, following the fulfilment of a read request to see if the same process is going to submit a request for data contiguous with the previous one. If it does then the scheduler will satisfy that request before considering others, thus saving head movement. And the best news is that the delay and other control parameters are configurable at the device level, giving Obsessive Compulsive Tuners some more factors to worry about.

So, how to test this?

I came up with the following plan. For each of the four scheduling options I would measure the performance of a single query that selects a lot of contiguous data via full table scan, and vary the degree of parallelism to see how the query time (and hence the rate of disk reads) varied. The query would be a simple SELECT COUNT(*) to minimise the possibility of the CPUs becoming a choke point for performance.

I wheeled out my handy Poweredge 6400 with four Xeon 700’s, 1Gb of RAM, and four Ultra160 disks of around 16Gb each on an Adaptec AIC7XXX card, and on which is conveniently running the 2.6.9-34.ELsmp kernel and Oracle 10.1.0.3.0 EE with Partitioning.

More later ….

Presentation of Summaries and Materialized Views

September 25th, 2006

I submitted an abstract for a presentation at the Rocky Mountain OUG Training Days 2007, on the subject of materialized views, summaries, fast refresh and whatnot. Obviously a lot of the substance will come from the material I’ve been posting here for the past year-and-a-bit, emphasising conceptual understanding, analysing the internals of the refresh process, a caveat here and there, the ongoing search for optimisation techniques, and with other material to tie it together as a coherent topic.
Since this is aimed at all levels of experience, from “what is a materialized view” through to “Ah, but in the case of partitioning of a HOLAP-based MV …” what issues would you like to see covered in the talk?

Here is what I have so far, in no particular order …

  • When and what to summarise.
  • Components of a materialized view (table, metadata etc).
  • Query rewrite.
  • Complete and fast refresh.
  • Fast refresh following conventional path, direct path, and partition DDL operations.
  • Propagating changes to the MV data.
  • Performance issues.
  • Alternatives to MV use and tuning manual refresh of summaries.
  • DBMS_Advanced_Rewrite
  • Caveats (32-column limitation on GROUP BY, ORA-12034, PCT and prebuilt tables etc)

A lot of ground to cover, but I’ve optimistically applied for a 90 minute slot — I think I could manage double that at a pinch!

What would you emphasise out of that, and what would you add?

IBM Sales Videos

September 19th, 2006

I doubt that there’s an I.T. sales consultant anywhere who couldn’t learn something from the way that IBM does it.

Restaurant Review: A Place to Avoid

September 14th, 2006

Should you ever find yourself in Lincoln, Nebraska, don’t make things worse by eating here.

join_histogram_output_bji.txt

September 11th, 2006

join_histogram_script_bji.txt

September 11th, 2006

Histograms For Join Predicates (or “Hooray for technical content!”)

September 11th, 2006

Someone sent me a scenario the other day similar to this:

A small dimension table, for example of US State names, has a synthetic primary key (say STATE#) and a unique key on the real-world value (STATE_NAME), and a much larger fact-type table has an indexed foreign key to this lookup table. The distribution of values in the fact table is very skewed. Is it possible for the optimizer to take into account the skew of STATE# in the fact table when it is joined to the dimension table and a predicate is placed on STATE_NAME?

Well, a tricky issue. Oracle has to resolve the predicate on dimension.STATE_NAME into a predicate on dimension.STATE# and thence into a predicate on fact.STATE#. The optimizer will do a very similar, and very effective, operation that I’ve written about a couple of times before through dynamic partition pruning based on joins, so my first reaction was to suggest implementing a list partitoning, or list subpartitioning, scheme on the table. For me that would be the gold standard of approaches, but it might not be possible, or might be not completely effective, if the organisation doesn’t want to pay for the partitioning option, or the logical limits of composite partitioning have already been reached, or if a table rebuild is too large a hammer to take to this nut.

A related thought was that if the partitioning option was in place then a summary table using a partitioning scheme more sympathetic to this aim could be implemented. For sure it’s a big step to take, especially if it means duplicating the entire data set of the current fact table, but it would extend the range of queries that can benefit from partition pruning very handily. The summary table could of course include the join to the dimension table and/or aggregate the result set, giving further benefits.

Another thought that occured was to create a bitmap join index, effectively allowing predicates to be placed on a hidden column of fact.STATE_NAME. I ran a couple of tests on 10.2.0.2 and it appeared to be ineffective in respect of histogram recognition. The estimated cardinality was always 1% of the total number of rows in the (non-partitioned) sample fact table even with hidden columns analyzed. Maybe I missed something there, and if I did then please enlighten me as it’s a pretty easy implementation.

Here’s a funky technique that actually did work — create a function to accept STATE# and return STATE_NAME, either through querying the dimension table or through embedding the logic in a PL/SQL CASE statement. Once indexed columns had been analyzed a predicate placed on this function (instead of the dimension table) did result in correct estimation of cardinality. The list of objections to this technique is pretty broad, including the possibility of returning the wrong result once the dimension table changes and a few other issues, but whether the risk is worth adopting or not is up to the user. One might choose to use DBMS_Advanced_Rewrite in 10g+ to have predicates against dimension.STATE_NAME rewritten to the hidden column, but the feasibility of that was not part of the test I ran/

There is one last suggestion of course — rebuild the table to use natural keys instead of synthetic keys. This would yield the same problem if the lookup table contained a commonly predicated additional level in the hierarchy, (eg. CONUS_Y_N, TOO_HUMID_IN_SUMMER or HAS_HELMET_LAW) and we’d be back to “square uno” once more.

To sum up, here are the solutions I thought of:

  1. Partition or subpartition the fact table on STATE#. (preferred option)
  2. Create a summary table with partitioning or subpartitoning on STATE#. (uses most space and slows data load, but very flexible and powerful)
  3. Create a function-based index on fact to perform the lookup, and query that value instead. (a bit flaky, but it works without major system impact)
  4. Rebuild the fact table based on the STATE_NAME instead. (still limited in multi-level hierarchies)

All other suggestions gratefully accepted. I’m really puzzled by not be able to achieve this with a bitmap join index, I must say. I’m sure I must have missed something simple

Here is my test script for the function-based index method, and here is the script output on 10.2.0.2.

EDIT: Following on from Pete’s suggestion in the comments, here is a script showing the attempt at a bitmap join index method, and again the output on 10.2.0.2.