Skip to content

Proceeding Cautiously with DB2 Event Monitors

NOTE: The following article is written from a DB2 LUW version 9, Unix OS perspective. I assume some familiarity with the DB2 System Monitor; this is not a performance primer.

The DB2 System Monitor is the database administrator’s ally in evaluating database performance. The¬†benefit, however, from having detailed metrics about your database’s behavior has to be weighed against potential database performance degradation when either executing GET SNAPSHOT or activating an event monitor. In this post I compare the relative cost of using snapshot, event monitor, and other performance utilities, and demonstrate how to manage their potential risks.

First, a word about latches. The System Monitor (via GET SNAPSHOT or through activation of an event monitor) acquires latches ( “. . .¬† low level serialization mechanisms used to protect shared data structures . . . ”1 ) to ensure accurate reporting. System Monitor latches can enqueue (delay) any database process or operation. Both user applications and database utilities may be affected. Significant global delays due to latches can cause a database hang (hang-wait) condition.2 Hang-waits can have many secondary symptoms: statements appear to be in lock contention but there is no obvious resource in contention; connection attempts hanging; utilities refusing to terminate.

DB2 monitoring and problem determination utilities vary widely in their database performance impact.

  • db2pd (first made available in DB2 LUW version 8.2) causes the least impact, because it does not acquire latches. db2pd as a relatively new utility is not as easily used as snapshots or event monitors, but it appears to be winning mindshare among database administrators. It’s text output is even more difficult to summarize and parse than either snapshot or event monitor output (db2pd will be the subject of a future post in this blog).
  • GET SNAPSHOT has a somewhat heavier potential performance footprint (which varies by number of monitor switches active, type of workload, and mix of static vs. dynamic SQL); GET SNAPSHOT has, according to IBM Toronto Laboratory’s Steve Rees3 , anywhere from a 1-10% effect.4
  • Event monitors have potentially the heaviest impact on a database and pose the greatest (potential) risk (depending upon the kind of monitor, transactions per minute, and event monitor parameters). Because the Event Monitor is a trace, the performance impact is directly proportionate to the number of events being captured. Carelessly configured STATEMENTS and TRANSACTIONS event monitors can do harm in a high Transactions Per Minute (TPM) environment ( TPM > 3000 ).

Taming the Event Monitor

The impact of event monitors on database performance is controlled by parameters in the CREATE EVENT MONITOR statement. Changes will require a DROP and new CREATE for the event monitor.

Command syntax for CREATE EVENT MONITOR:




Source: DB2 Version 9 for Linux, Unix and Windows SQL Reference Volume 2

Performance Considerations for Event Monitor WRITE TO PIPE/FILE/TABLE

WRITE TO PIPE: Of the three methods of writing event monitor output, write to pipe causes the lowest database impact. It is also the easiest to initially configure (because there are few options available). Initialization and cleanup are simple (there are no event record stream files written to disk). It is the most transparent to use but leaves few options open to the DBA if pipe buffer overflow occurs; in that case performance data will simply be lost. Buffering is entirely the responsibility of the OS and if the buffer overflows records will be discarded. You will see overflow records for WRITE TO PIPE, so at least you will know when data has been lost. This method may be best for use of long-running unattended event monitors where a specific problem has not yet been identified.

WRITE TO FILE: Write to file offers database-level buffer blocking and the opportunity to avoid lost data, an advantage over WRITE TO PIPE. MAXFILES and MAXFILESIZE parameters allows control over space utilization by the event monitor. I have observed that the performance impact seems low, even on high transaction production systems. Because of the relatively small footprint, I prefer to use WRITE TO FILE as my default initial trace before escalating to a more comprehensive WRITE TO TABLE trace.

The disadvantage of WRITE TO FILE is that the text output is very difficult to analyze. The Unix-platform DBA will probably need to use text parsing utilities (sed, awk, sort, grep, etc.) or write a perl script in order to easily summarize data.

WRITE TO TABLE: Write to Table is the most powerful means of dumping Event Monitor data, but using default options this method also has the greatest performance impact. Write to Table using all defaults can cause noticeable or critical slowdown in a high TPM application. The DBA should use this approach with great caution. On the other hand, with proper tuning (see below) the ability to browse and filter the resulting trace data using SQL can make short work of performance pattern analysis.

Effects of Event Monitor BLOCKED versus NONBLOCKED Buffering

The event monitor file and table write options buffer their output for performance reasons. You can control buffering behavior for both writing to file and writing to table via the BUFFERSIZE and BLOCKED/NONBLOCKED parameters (pipes have their own internal buffers at the operating system level).

BUFFERSIZE: When an event monitor is started, two buffers are created in order to allow asynchronous I/O. The BUFFERSIZE parameter expresses the size of each of the two event monitor buffers; they will be both sized to value = BUFFERSIZE, so the actual space allocated for event monitor buffering total, in bytes, will be 2 X BUFFERSIZE X page size (4K). BUFFERSIZE, the number of 4K pages allocated, has a default (and minimum) value of 4.5 recommends, if you are noticing performance impact from using event monitors, a larger buffersize such as 512. If yours is a STATEMENTS monitor, a larger BUFFERSIZE will also allow more complex SQL statements to be reported, since a statement that exceeds the size of the buffer will be overflowed (truncated).

BLOCKED/NONBLOCKED: “A blocked event monitor suspends the database process that is sending monitor data when both of its buffers are full. This is to ensure that no event records are discarded while the blocked event monitor is active. The suspended database process and consequently, any dependent database processes cannot run until a buffer has been written. This can introduce a significant performance overhead, depending on the type of workload and the speed of the I/O device. Event monitors are blocked by default.”6 BLOCKED is the default. In my experience BLOCKED event monitors which WRITE TO FILE do not cause problems for applications even with a TPM of over 4000. On the other hand, WRITE TO TABLE can be substantially slower, and blocking the event monitor for table writes may result in excessive hold on latches and serious application slowdown.There are ways to improve WRITE TO TABLE performance (see below) but in a high transaction environment you may still have little choice but to use a NONBLOCKED event monitor. Using NONBLOCKED may result in substantial loss of performance data, but the advantages of WRITE TO TABLE may still under those circumstances have sufficient merit that some data loss is acceptable (If data is lost, Overflow Event records will be written to the event monitor output; you can experiment with a NONBLOCKED event monitor knowing that you will know if data is lost, and how much).

Tuning Specifically for Write to Table

Besides using the NONBLOCKED buffer option and a larger BUFFERSIZE, Write to Table has several more parameters that can be used to tune its performance and reduce impact on the database application.

  • TRUNC: If you use TRUNC, STMT_TEXT will be stored in a VARCHAR instead of a LOB datatype column. Without a LOB column table accesses will be much more efficient. The downside is that reported complex statements may be truncated.7
  • IN parameter: Using “IN…” places the the event monitor output table in a dedicated tablespace. This may not help all databases, but it can allow avoidance of I/O conflicts through container separation. It might also allow a larger page size if such does not already exist, to accommodate potentially long row size of the event monitor output table.8 It is also my recommendation that you use a dedicated tablespace for event monitor tables so that growth of the table does not jeopardize space requirements of other tables in your database.
  • Use INCLUDES/EXCLUDES clause to reduce the set of monitor elements gathered. A shorter table row will have better INSERT performance.
  • Use of WHERE clause to exclude events of no interest. This can also make problem analysis easier through the elimination of background noise in the data.

To summarize, monitor the impact of WRITE TO TABLE event monitors very carefully, especially on high TPM production systems. If you see performance degradation, first apply larger BUFFERSIZE and NONBLOCKED parameter. The above additional recommendations can further lessen the footprint of the event monitor. You may also find that you do not, after all, benefit from writing event monitor output to a table. In that case, WRITE TO FILE is easier, simpler, and performs better.

Event Monitor Space Management

The final problem encountered in using the event monitor is that event stream output can quickly fill up a target filesystem (WRITE TO FILE or WRITE TO TABLE in a SMS tablespace) or tablespace (WRITE TO TABLE in a DMS tablespace). A full tablespace or filesystem can trigger novel and unwelcome challenges. To guard against this, consider the following:

If Using WRITE TO PIPE

Space overflows will not occur since the stream via pipe will be limited to the pipe buffer. If buffer is exceeded, event monitor output will be discarded.

If Using WRITE TO FILE

  • When defining the event monitor, ensure automatic termination of the event monitor by use of parameters MAXFILESIZE and MAXFILES. The default behavior for WRITE TO FILE event monitors is to write forever, until deactivated, or until the event monitor receives I/O errors from the filesystem being full. By determining how much free space is available on the target filesystem using the df command, you can determine a safe amount of event monitor stream data to write, and define that limit in bytes as 4K X MAXFILESIZE X MAXFILES. When the event monitor has written number of files equal to MAXFILES, the monitor will terminate.
  • Write to a filesystem segregated from filesystems used by other essential processes. In particular do not write event monitor output within filesystems which also contain database directory, logging directory, or paths defined for any SMS tablespace.
  • Write and run a script to move closed event monitor files out of target directory while event monitor is active, to prevent space usage from increasing. The script could potentially run compression (e.g. gzip), move the files to another filesystem, write them to tape, or scp them to another computer. The DB2 event monitor allows output files to be moved as long as they have been completely written and closed.
  • Remember to delete event monitor output when you have completed your performance analysis.
  • Monitor filesystem disk usage diligently, preferably via cron-driven script.

If Using WRITE TO TABLE

  • Contain the effect of unrestrained growth in an event monitor table by placing the table in its own tablespace.This is a good idea regardless of whether you use a SMS or DMS tablespace. Run the event monitor for a brief period, then study space utilization, to allow you to extrapolate how much space will be required in order to run the intended trace. You may need to increase pages allocated (DMS) or add more containers or space to filesystem (SMS). In extrapolating page space requirements, make provision for indexes that might also be required for query performance.
  • Growth can be slowed by truncating statement text (if captured) using the TRUNC parameter.
  • Growth can be slowed by using a smaller subset of data elements using the INCLUDE/EXCLUDE parameters.
  • Write and execute regularly a prune script to delete performance data based on timestamp.

In a DMS Tablespace

  • Use PCTDEACTIVATE parameter to terminate the event monitor before 100% of tablespace pages are used.

In a SMS Tablespace

  • Use dedicated filesystem for containers allocated to your event monitor table tablespace.
  • Monitor filesystem disk usage diligently, preferably via cron-driven script.

Additional References

Footnotes

  1. http://www.geekinterview.com/question_details/13127 []
  2. IBM APAR’s ambiguously use the terms “hang” and “hang-wait” without explanation. I would suggest using “hang” to refer to a “wait forever” condition on the database. A “hang-wait”, on the other hand, refers to a wait of finite and problematic duration. []
  3. srees -AT- ca.ibm.com []
  4. Steve Rees’ work is cited in an article by Chris Eaton in IT Toolbox. Rees, in his unpublished IDUG 2004 presentation titled The Doctor is In! Advanced Performance Diagnostics in DB2, offers the following observations:
    • Snapshot has more of an effect via increased CPU consumption than through impact on concurrency, though of course latches are active.
    • Snapshot in a system with mostly Dynamic SQL will have greater effect than in a system with Static SQL. Snapshot impact in a DSS (running complex SQL) is, except for Table and Bufferpool Switches, generally greater than in an OLTP database.
    • Dynamic SQL snapshot will impact 5-10% based on number of statements per minute.
    • Lock Switch impact doesn’t exceed 1-3%.
    • Table and Bufferpool switches tend not to go much above 2% cost.
    • Sort, Statement, and UOW Switches all have the ability under the certain workloads to incur 5-10% performance cost.
    • The cost of activation of ALL switches can be as low as 3% in an OLTP running Static SQL or go as high as 10% in a DSS running complex SQL.
    • Turning off the TIMESTAMP switch will lower cost by a small amount.

    []

  5. Steve Rees, The Doctor is In! Advanced Performance Diagnostics in DB2, IDUG 2004 presentation. []
  6. DB2 Information Center []
  7. Steve Rees, The Doctor is In! Advanced Performance Diagnostics in DB2, IDUG 2004 presentation. []
  8. Steve Rees, The Doctor is In! Advanced Performance Diagnostics in DB2, IDUG 2004 presentation. []

One Comment

  1. There is so imteresting for me! Thank you!

    Posted on 10-Feb-09 at 05:26 | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*