Native DB2 LUW database snapshot utilities identify performance characteristics of the database at the moment they are run, but are not easily used to identify performance trends over days, weeks and months. There are obvious advantages to being able to compare performance trends over time; furthermore many snapshot data elements are not meaningful unless seen changing over time. With the advent in DB2 version 9 of System Administrative Snapshot Views, automating historical performance capture has become easy for Unix database administrators.
This article will show the administrator how to use Unix shell scripts to build a home-grown database performance history system without resort to third-party tools. Shell scripts are used to automate performance capture and provide 24×7 monitoring support. The author also provides 40 views representing calculated common metrics of the resulting performance history data.
Our target audience for this article are Unix DB2 Database Administrators and system support staff tasked with DB2 database monitoring in Unix environments. Solaris, AIX and Linux DB2 DBA’s running DB2 version 9 and later will benefit.
The benefits of consistently capturing historical performance data using snapshot administrative views are many:
- Many levels of granularity are possible. Look at performance characteristics over an hour, day, week or month.
- Quick access to hundreds of snapshot data elements.
- Understand overnight/weekend performance problems.
- Retrospective debugging of database problems. How many times have you found out about a problem hours after the problem actually manifested, too late to run snapshots and left to speculate about the source of the problem? DB2 issues often manifest in the middle of the night or on weekends. By the time you’re on the scene, the source of the crash is long gone, leaving little besides db2diag.log for backtracking. Having historical snapshot performance data on hand will allow you to more easily track down the gremlin.
- Visualization potential using graphing software.
This longitudinal capability was what made DB2 monitoring tools such as Database Guys ((later DGI; purchased in 2003 by BMC Software)) so powerful. You don’t need to purchase a third party tool to gain this capability. You just need to build your own capture system. It is not difficult.
Prior to DB2 version 8, the only way to obtain snapshot data was to generate a text file from GET SNAPSHOT output, and use shell scripts, perl or other parsing tools to aggregate the data. IBM introduced snapshot table functions in version 8.1 . These represented a huge leap forward by giving administrator access to snapshot data via SQL. For DB2 version 9 IBM introduced snapshot Administrative Views, which are now the preferred means of accessing snapshot data, and are easier to write for than table functions. Snapshot administrative views present snapshot data similarly to system catalog views.
Both snapshot Table Functions and Administrative Views are very useful, providing snapshot data about current database performance mediated by powerful SQL reporting language. The behavior of the database at this moment in time might be all you need to know. But the ugliest problems seem to happen outside of the 9-5 window, and single time slice snapshots don’t permit us to perceive patterns such as trends, recurring performance spikes, or slow degradation in database performance over the span of weeks. Snapshot data becomes vastly more useful when stored over time in performance monitoring tables.
Here’s how.
Assumptions
- Version: DB2 LUW Version 9.
- Unix OS is used for scripting examples, but the system could be adapted for Windows.
- Either SYSADM authority or a combination of SYSMON and other authority [for table creation].
- Understanding of System Monitor data elements and performance metrics. (see References.)
- My performance view DDL and related queries were written for the condition of a single database per instance, because that was true in the shop where I developed this version of the capture system. If you have multiple databases per instance, modify my examples for primary key, metric view SELECT columns, predicates and GROUP BY clauses of report views to include DB_NAME. If you are a DPF shop, you will also need to make a similar change for column DBPARTITIONNUM.
Create snapshot historical tables
This is possibly the best reference for available snapshot administrative views.
This article provides CREATE DDL for only 5 snapshot capture tables. These are the only tables you will need for the specific performance reports provided in this series (actually you won’t need SNAP_TBS_CFG_T either, but I had it so decided to include it).
If you want to store the full subset of columns from an administrative view, the quickest way to create your historical system administrative table is:
- Use the CREATE TABLE . . . LIKE… syntax to create the table, for example
db2 create table DBA.SNAPDB like SYSIBMADM.SNAPDB - Use ALTER TABLE.. ALTER COLUMN SET NOT NULL to make not null those columns which will comprise your primary key, for example:
db2 alter table DBA.SNAPDB alter column SNAPSHOT_TIMESTAMP set not null - You may need to reorg your new table; for example
db2 reorg table dba.snapdb - Create your unique index for primary key, and alter table add primary key. (( You will notice that my CREATE INDEX DDL omits DB_NAME, because I developed the DDL for a single database instance. You will want to modify DDL for a multiple database instance (if you are a DPF shop, you will also need to make a similar change for column DBPARTITIONNUM). I chose to use table schema ‘DBA’ for the performance tables, you can use whatever name you want, edit the performance views to reflect your changes. ))
If you do not want to store the full set of columns from the SYSIBMADM view, you can create custom DDL for the administrative view. Unfortunately db2look will not generate DDL directly from an administrative view, because SYSIBMADM objects are not true user views. But what you can do to make quick work of editing your own DDL is to follow Step #1 above (CREATE TABLE. . . LIKE . . . ) and then run db2look against the resulting object. Drop the table you created after you have produced the DDL from db2look. Massage the DDL to include only those columns you wish kept for history. Create the table from your edited DDL, then create your primary key. (( Thank you to Frederik Engelen for help simplifying the process of creating administrative view history. ))
If you choose to capture a smaller subset of administrative view columns, change your populate table INSERT subselect to use explicitly named set of columns (instead of SELECT *).
The snapshot administrative views I use in the remainder of my examples are:
| Snapshot Admin View | Download DDL |
| SYSIBMADM.SNAPDB | dba.snapdb.ddl |
| SYSIBMADM.SNAPTAB | dba.snaptab.ddl |
| SYSIBMADM.SNAPAPPL | dba.snapappl.ddl |
| SYSIBMADM.SNAPSTMT | dba.snapstmt.ddl |
Download all these DDL files as a single zip.
Create these tables, and proceed to the next step: Populate snapshot historical tables
Populate snapshot historical tables
These script examples use cron and Korn shell, but you can substitute your own scheduler and scripting tool. For the performance granularity I sought, I had the populate script run every five minutes. Your executing user id must have at least SYSMON authority and INSERT authority on the target DBA schema tables.
Populate script: populate_snapshot_tables.ksh
Here is a sample line from cron -l :
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /db2/support/scripts/populate_snapshot_tables.ksh -d dbname
Performance Metric Views
Included in this article are about forty views of database performance metrics. The 2-column views all feature an initial column SNAPSHOT TIMESTAMP with the second being the metric. These views are ideally suited for time graph 2-axis charting and visualization.
We then present a few views (3 or more columns) with more complex presentation of performance metrics, not suitable for an X-Y graph but useful for the diagnosis of specific performance issues.
Please see my References for sources and credits. All SQL in this article is original and is protected under a Creative Commons 3.0 license. (( I can also help you to adapt these database monitoring tools to your environment; see my consulting website for contact information. ))
I wrote these views for an environment in which there was only a single database per DB2 instance, so users with multiple databases per instance will have to make a small modification of some views. With multiple databases per instance, any views which interpret table DBA.SNAPDB, DBA.SNAPAPPL, or DBA.SNAPSTMT may need to be modified to add DB_NAME to select clauses, predicates and GROUP BY clauses ( If you are a DPF shop, you will also need to make a similar change for column DBPARTITIONNUM.).
To download ddl by individual view, go to the individual view entry below and click on the “[DOWNLOAD VIEW DDL]” link next to the view link. To download all DB2 view DDL at once as a ZIP archive, right-click on this link.
Peformance Metrics Two-Column Views
Application and Connection Metrics
Applications Connected
Description: Number of connections (logical application sessions, Application ID’s) to the database by snapshot timestamp.
Usage example:
select APPLS_CUR_CONS from dba.v_appls_cur_cons order by snapshot_timestamp desc fetch first 1 row only
Applications Executing
[DOWNLOAD VIEW DDL]
Description: Applications Executing Currently (within 5 minutes). Number of connected AND EXECUTING logical session threads, or Applications, at time of snapshot. Most connections will not be executing at any snapshot point in time.
Usage example:
select APPLS_IN_DB2 from dba.v_appls_in_db2 order by snapshot_timestamp desc fetch first 1 row only
New Connects Per Minute
[DOWNLOAD VIEW DDL]
Description: Number of new connects formed per minute.
Usage example:
select DHR_PCT from DBA.V_DATA_HIT_RATIO order by snapshot_timestamp desc fetch first 1 row only
Physical (tablespace I/O) Metrics
Asynchronous Write Percentage (AWP)
[DOWNLOAD VIEW DDL]
Description: Asynchronous data and index writes to tablespace (“physical”) as a percentage of total index and data physical writes. It is ALWAYS desirable to do a high percentage of Asynchronous Write I/O. Not only do Asynchronous Write I/Os avoid delaying the application connection, but they also tend to be 3 to 10 times faster depending on your disks and storage system.
It is desirable to have AWP greater than 90%. However, if your database has very little Insert, Update, or Delete activity, then achieving a high AWP will be difficult, if not impossible, to do. On the other hand, if your database has DMLTX (update/insert/delete per transaction) greater than or equal to 1, then achieving a high AWP value is very important. To achieve a high AWP, you need to have a sufficient number of I/O Cleaners (DB CFG NUM_IOCLEANERS) and not have the Changed Pages Threshold set too high (DB CFG CHNGPGS_THRESH). (( See Scott Hayes article. ))
How to calculate:
AWP = ( ( (Asynchronous pool data page writes + Asynchronous pool index page writes) * 100 ) / (Buffer pool data writes + Buffer pool index writes) )
Usage example:
select ASYNC_WRITE_PCT from DBA.V_ASYNC_WRITE_RATIO order by snapshot_timestamp desc fetch first 1 row only
Physical Synchronous Read Percentage (SRP)
[DOWNLOAD VIEW DDL]
Description: Indicates the number of data pages read in from the table space containers (physical) through synchronous processes, by database manager agents, as a percentage of total physical reads. Synchronous reads are in contrast to asynchronous reads by engine dispatchable units (EDUs). Synchronous physical reads are always preferable to asynchronous reads.
SRP = 100 - ( (100 * (Asynchronous pool data page reads + Asynchronous pool index page reads)) / (Buffer pool data physical reads + Buffer pool index physical reads) )
Usage example:
select SYNC_READ_PCT from DBA.V_SYNC_READ_PCT order by snapshot_timestamp desc fetch first 1 row only
Logical and Physical Read Rate Per Minute
[DOWNLOAD VIEW DDL]
Description: Sum of:
- number of data pages which have been requested from the buffer pool (logical) for regular and large table spaces, and
- number of data pages read in from the table space containers (physical) for regular and large table spaces.
Logical + Physical Data Reads per minute. An indicator of overall read activity.
How to calculate:
Total read rate = ( bufferpool.data.logical.reads + bufferpool.data.physical.reads ) / time.interval
Usage example:
select PHYS_READ_IO_PER_MINUTE from DBA.V_PHYS_IO_READ_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Bufferpool I/O Ratio Metrics
Bufferpool Overall Hit Ratio Percentage
[DOWNLOAD VIEW DDL]
Description: Percentage of data and index reads directly from bufferpool (as opposed to read from disk). Should be 98-100% most of the time.
How to calculate:
1 - ((pool_data_p_reads + pool_xda_p_reads +
pool_index_p_reads + pool_temp_data_p_reads
+ pool_temp_xda_p_reads + pool_temp_index_p_reads )
/ (pool_data_l_reads + pool_xda_l_reads + pool_index_l_reads +
pool_temp_data_l_reads + pool_temp_xda_l_reads
+ pool_temp_index_l_reads )) * 100%
Usage example:
select OVERALL_HIT_RATIO from DBA.V_OVERALL_HIT_RATIO order by snapshot_timestamp desc fetch first 1 row only
Bufferpool Data Hit Percentage
[DOWNLOAD VIEW DDL]
Description: Percentage of data reads that are from the bufferpool rather than from physical disk. A measure of how efficiently the bufferpool is being used.
How to calculate:
Data Hit Percentage = 100 - 100 * ( bp.data.physical.reads / bp.data.logical.reads )
Usage example:
select DHR_PCT from DBA.V_DATA_HIT_RATIO order by snapshot_timestamp desc fetch first 1 row only
Bufferpool Index Hit Percentage
[DOWNLOAD VIEW DDL]
Description: Percentage of time that index reads are from bufferpool (as opposed to physical, from disk). Like Data Hit Ratio, we want to keep this in the 98%-100% range.
How to calculate:
Index Hit Ratio Percentage = 100 - 100 * ( bp.index.physical.reads / bp.index.logical.reads )
Usage example:
select INDEX_HIT_RATIO from DBA.V_INDEX_HIT_RATIO order by snapshot_timestamp desc fetch first 1 row only
Dirty Page Threshold Cleaners Ratio Percentage (TCR)
Description:
How often a bufferpool page cleaner was invoked because a buffer pool had reached the dirty page threshold criterion for the database (dirty pages contain data that has been changed in the bufferpool, but not yet written to disk). The threshold is set by the chngpgs_thresh configuration parameter. It is a percentage applied to the buffer pool size. When the number of dirty pages in the pool exceeds this value, the cleaners are triggered.
If chngpgs_thresh is set too low, pages might be written out too early, requiring them to be read back in. If set too high, then too many pages may accumulate, requiring users to write out pages synchronously in a long burst, tying up other processing (with poor resulting performance).
If this ratio is low, it may indicate that you have defined too many page cleaners. Or CHNGPGS_THRESH database configuration parameter is too high. If this ratio is high, it may indicate that you have too few page cleaners defined. Either way overall performance is impacted. Bufferpool size is also involved.
How to calculate:
TCR = 100 * ( dirty.page.threshold.cleaner.triggers / (dirt.page.steal.cleaner.triggers + lsn.gap.cleaner.triggers + dirty.page.threshold.cleaner.triggers) )
Usage example:
select THRESHOLD_CLEANS_RATIO from DBA.V_THRESHOLD_CLEANS_RATIO order by snapshot_timestamp desc fetch first 1 row only
Victim Cleans Ratio Percentage (VCR)
[DOWNLOAD VIEW DDL]
Description: Expressed as percentage of total page cleans, the number of times a page cleaner was invoked because a synchronous write was needed during the victim buffer replacement for the database. From a Quest document: “A victim page is a clean or dirty page in the buffer pool that is removed simply because DB2 needs to make room for incoming pages. If a victim page is a dirty page then the information must be written out to disk.”
How to calculate:
(VCR) = 100 * ( dirt.page.steal.cleaner.triggers / (dirt.page.steal.cleaner.triggers + lsn.gap.cleaner.triggers + dirty.page.threshold.cleaner.triggers) )
You don’t want this over 40%, but also not too low. If this ratio is low, it may indicate that you have defined too many page cleaners. If your chngpgs_thresh is set too low, you may be writing out pages that you will dirty later. Aggressive cleaning defeats one purpose of the buffer pool, that is to defer writing to the last possible moment. If this ratio is high, it may indicate that you have too few page cleaners defined, or your bufferpool is too small. Too few page cleaners will increase recovery time after failures.
Usage example:
select VICTIM_CLEANS_RATIO from DBA.V_VICTIM_CLEANS_RATIO order by snapshot_timestamp desc fetch first 1 row only
Transactional Metrics
Bufferpool Logical Index Reads Per Transaction (BPLITX)
[DOWNLOAD VIEW DDL]
Description: Can help detect the existence of Index Page Leaf Scans. (( Scott Hayes says:
Let’s say you have an index on columns A and B. If you have an SQL query that says WHERE B = ‘some value’ but does not include search criteria for column A, then it is highly likely that DB2 will asynchronously prefetch the leaf pages of the index to scan for occurrences of B = ‘some value’. If there are 1000’s of leaf pages in a 3 or 4 level index, this can be a very CPU, and possibly I/O, expensive operation.
))
Usage example:
select BPLI_PER_TRANSACTION from DBA.V_BPLITX order by snapshot_timestamp desc fetch first 1 row only
BP Logical Reads Per Transaction (BPLRTX)
[DOWNLOAD VIEW DDL]
Description: Bufferpool logical data and index reads per transaction.
How to calculate:
BPLRTX = (Buffer Pool Data Logical Reads + Buffer Pool Index Logical Reads) / (Commit statements attempted + Rollback statements attempted)
The idea is to get BPLRTX as low as possible. (( Scott Hayes suggests:
To get an idea of your ‘best case goal’ for BPLRTX, we need to look at a few data points:
- The average number of SELECTS per Transaction (SELTX)
- The average number of Insert, Update, and Deletes per Transaction (DMLTX)
- The average number of Statements per Transaction (STMTTX) = SELTX + DMLTX
- The average number of Index NLEVELS in the database: Select avg(NLEVELS) from SYSCAT.INDEXES
- Add 1 to the average NLEVELS giving AVGNLEVEL
Finally, your ‘best case goal’ for BPLRTX is computed by the formula:
BPLRTX GOAL = STMTTX x (AVGNLEVEL + 1)This goal provides the minimum possible, best case, value for BPLRTX by multiplying the average number of statements per transaction by the (rounded up) average Index NLEVELS plus 1 to cover a logical read to the data page. By way of example, if an average transaction does 8 SELECT statements and 2 DML statements, and if the average NLEVEL is 3, then best case BPLRTX GOAL = (8+2) x (3+1) = 40 Bufferpool Logical Reads per Transaction.
Now, let’s be honest with each other. The world isn’t perfect, the application’s SQL is probably far from perfect, and the database’s indexes aren’t perfect either. SO, a realistic goal would include adding a 50% slop factor to the ideal goal. This would give us BPLRTX 60 in the previous example.
))
Usage example:
select BPLR_PER_TRANSACTION from DBA.V_BPLRTX order by snapshot_timestamp desc fetch first 1 row only
Sorts Per Transaction (SRTTX)
[DOWNLOAD VIEW DDL]
Description: Total sorts divided by number of transactions (units of work). I.e. number of sorts before a commit is taken.
Usage example:
select SORTS_PER_TRANSACTION from DBA.V_SORTS_PER_TRANSACTION order by snapshot_timestamp desc fetch first 1 row only
Data Manipulation Language Statements per Transaction (DMLTX)
[DOWNLOAD VIEW DDL]
Description: Update/delete/insert statements per transaction (no Selects counted). This graph helps us understand patterns in application update behavior.
Usage example:
select DMLTX from DBA.V_DMLTX order by snapshot_timestamp desc fetch first 1 row only
Select/Insert/Update/Delete Statements per Transaction (SIUD_SQL_STMTS_TX)
[DOWNLOAD VIEW DDL]
Description: Select/Update/delete/insert statements per transaction (i.e. all SQL excluding DCL).
Usage example:
select SIUD_SQL_STMTS_TX from DBA.V_STMTS_TX order by snapshot_timestamp desc fetch first 1 row only
Select Statements per Transaction (SELECT_STMTS_TX)
[DOWNLOAD VIEW DDL]
Description: Select statements (only) per transaction.
Usage example:
select SELECT_STMTS_TX from DBA.V_SELECT_STMTS_TX order by snapshot_timestamp desc fetch first 1 row only
Transactions Per Minute (TPM)
[DOWNLOAD VIEW DDL]
Description: Transactions per minute. Your indication of whether or not this is a high transaction volume database.
Usage example:
select UOW_PER_MINUTE from DBA.V_UOW_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Locking Metrics
Lock Escalations since last snapshot
[DOWNLOAD VIEW DDL]
Description: Number of lock escalations since last snapshot (last 5 minutes). I was dealing with a database with very rare lock escalation so decided not to do a per minute calculation but just show the number of lock escalations last snapshot.
Usage example:
select LOCKESCALS from DBA.V_LOCK_ESCAL order by snapshot_timestamp desc fetch first 1 row only
Exclusive (X) Lock Escalations Since Last Snapshot
[DOWNLOAD VIEW DDL]
Description: An exclusive (X) lock escalation can cause locking contention, so seeing this happen on a frequent basis helps us to tune both the application and table locking parameters.
Usage example:
select X_LOCKESCALS from DBA.V_XLOCK_ESCAL order by snapshot_timestamp desc fetch first 1 row only
Lock Timeouts since last snapshot
[DOWNLOAD VIEW DDL]
Description: Number of lock timeouts (SQLCODE -911 return code 68) since last snapshot (last 5 minutes). As with lock escalations, I was dealing with a database with very rare lock timeouts so decided not to do a per minute calculation but just show any at all that had occurred – any non-zero value was of concern.
Usage example:
select LOCK_TIMEOUTS from DBA.V_LOCK_TIMEOUTS order by snapshot_timestamp desc fetch first 1 row only
Deadlocks since last snapshot
[DOWNLOAD VIEW DDL]
Description: Number of Deadlocks (SQLCODE -911 return code 2) since last snapshot (last 5 minutes or monitoring interval).
Usage example:
select DEADLOCKS from dba.v_deadlocks order by snapshot_timestamp desc fetch first 1 row only
LockWait Time Seconds Per Minute
[DOWNLOAD VIEW DDL]
Description: Total lockwait time for all applications expressed in seconds per minute.
Usage example:
select LOCKWAIT_TIME_PER_MINUTE from DBA.V_LOCKWAIT_TIME_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Lock Waits Per Minute
[DOWNLOAD VIEW DDL]
Description: Total lockwaits for all applications per minute.
Usage example:
select LOCKWAITS_PER_MINUTE from DBA.V_LOCKWAITS_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Sort Metrics
Sort Time Seconds Per Minute
[DOWNLOAD VIEW DDL]
Description: Total sort time, all applications, in seconds per minute. A good indicator of when sort-intensive processes are running.
Usage example:
select SORT_TIME_PER_MINUTE from DBA.V_SORT_TIME_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Sorts Per Minute
[DOWNLOAD VIEW DDL]
Description: Total number of sorts, all applications, per minute.
Usage example:
select SORTS_PER_MINUTE from DBA.V_SORTS_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Logging Metrics
Log Cleans Ratio Percentage (LCR)
[DOWNLOAD VIEW DDL]
Description: Number of times LSN gap cleaner trigger as percentage of total cleaners. From the IBM DB2 Version 9 Infocenter:
When the amount of log space encompassing the log record which has updated the oldest page in the buffer pool and the current log position exceeds that allowed by the softmax parameter, it is said that the database is in an ‘LSN gap’ situation. Under the default method of page cleaning, when the logger detects than an LSN gap has occurred, it will trigger the page cleaners to write out all the pages which are contributing to the LSN gap situation. That is, it will write out those pages which are older than what is allowed by the softmax parameter. Page cleaners will be idle for some period of time while no LSN gap is occurring. Then, once an LSN gap occurs, the page cleaners are activated to write a large number of pages before going back to sleep. This can result in the saturation of the I/O subsystem, which then affects other agents which are reading or writing pages.
When LSN gap cleaner triggers (LCR approaches 100%), check for performance issues on the database. If you have configured very large logs, softmax may need to be successively lowered until LSN gap cleaner triggers more frequently. If there is strong correlation, it might be necessary to use the new DB2 proactive page cleaning feature DB2_USE_ALTERNATE_PAGE_CLEANING (registry setting).
How to calculate:
LCR = 100 * ( lsn.gap.cleaner.triggers / ( dirt.page.steal.cleaner.triggers + lsn.gap.cleaner.triggers + dirty.page.threshold.cleaner.triggers ) )
Usage example:
select LOG_CLEANS_RATIO from DBA.V_LOG_CLEANS_RATIO order by snapshot_timestamp desc fetch first 1 row only
Log Write IO per Minute
[DOWNLOAD VIEW DDL]
Description: The number of I/O requests issued by the logger for writing log data to the disk, per minute. This is an indicator of logging activity, and can be used in conjunction with other graphs to determine whether logging is responsible for application performance issues. Can also point toward heavy update periods.
Usage example:
select NUM_LOG_WRITE_IO_PER_MINUTE from DBA.V_LOG_WRITE_IO_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Total Log Used (MB)
[DOWNLOAD VIEW DDL]
Description: Total log space in megabytes. This can show us problematic or dangerously large units of work, could also be an alert that we are about to exhaust log space. Also allows you to better visualize the growth and contraction of log files over a period of time.
Usage example:
select TOTAL_LOG_USED_MB from DBA.V_TOTAL_LOG_USED_MB order by snapshot_timestamp desc fetch first 1 row only
Miscellaneous 2-Column Metrics
SQL Execution Time Seconds Per Minute
[DOWNLOAD VIEW DDL]
Description: SQL Execution Time in seconds per minute.
Usage example:
select ELAPSED_EXEC_TIME_S_PER_MINUTE from DBA.V_SQL_EXEC_TIME_PERMINUTE order by snapshot_timestamp desc fetch first 1 row only
Rows Read Per Minute
[DOWNLOAD VIEW DDL]
Description: Rows read from all tables, per minute. Not the same as rows selected. For example a poorly performing SQL might have to read 10,000 rows of a table in order to return only one row in a SELECT statement.
Usage example:
select ROWS_READ_PER_MINUTE from DBA.v_rows_read_perminute order by snapshot_timestamp desc fetch first 1 row only
3+ Column Performance Views
RR/RS Ratio (RRRSR)
[DOWNLOAD VIEW DDL]
Description: Rows Read / Rows Selected Ratio (RRRSR) by agents’ ID. RRRSR indicates how many table rows were read, total, as a ratio to number of rows returned in all SELECT statements executed by that agent during its lifetime. This can be an indicator of poor SQL performance.
Table Describe:
describe table dba.v_rr_rs_ratio Column Type Type name schema name Length Scale Nulls ------------------------------ --------- ------------------ -------- ----- ------ SNAPSHOT_TIMESTAMP SYSIBM TIMESTAMP 10 0 No AGENT_ID SYSIBM BIGINT 8 0 No APPL_RR_RS_RATIO SYSIBM DECIMAL 18 2 Yes
RR/RS Ratio (RRRSR) Aggregates
[DOWNLOAD VIEW DDL]
Description: Maximum and Average value found of Rows Read to Rows Selected Ratio (RRRSR) for all agents sampled at this interval. RRRSR indicates how many table rows were read, total, as a ratio to number of rows returned in all SELECT statements executed by that agent during its lifetime. This metric is an aggregate among all agents alive at that snapshot. The maximum is a good indicator of performance spikes.
This view is dependent on view dba.v_rr_rs_ratio (above).
Table Describe:
describe table dba.v_rr_rs_ratio_aggregate Column Type Type name schema name Length Scale Nulls ------------------------------ --------- ------------------ -------- ----- ------ SNAPSHOT_TIMESTAMP SYSIBM TIMESTAMP 10 0 No APPL_RR_RS_RATIO_AVG SYSIBM DECIMAL 18 2 Yes APPL_RR_RS_RATIO_MAX SYSIBM DECIMAL 18 2 Yes
Statements with High RR/RS Ratio (RRRSR)
[DOWNLOAD VIEW DDL]
Description: All captured statements which match applid/timestamp with at least 80% of max RR_RS ratio for that snapshot interval. The view should show you some of the worst performing SQL captured at your snapshot interval. Because snapshot microseconds might not match between administrative views SNAPSTMT and SNAPAPPL, SQL allows for join between the two capture tables with a 2 second variance, which can be adjusted for your own implementation. The 80% of max value can also be easily modified for your use.
Table Describe:
describe table dba.v_stmt_high_rr_rs_ratio Column Type Type name schema name Length Scale Nulls ------------------------------ --------- ------------------ -------- ----- ------ SNAPSHOT_TIMESTAMP SYSIBM TIMESTAMP 10 0 No RR_RS_RATIO SYSIBM DECIMAL 18 2 Yes STMT_TEXT SYSIBM CLOB 16777216 0 Yes
Rows Read per minute by Table
[DOWNLOAD VIEW DDL]
Description: By table, rows read per minute per snapshot interval.
Table Describe:
describe table dba.v_table_rows_read_perminute Column Type Type name schema name Length Scale Nulls ------------------------------ --------- ------------------ -------- ----- ------ SNAPSHOT_TIMESTAMP SYSIBM TIMESTAMP 10 0 No TABSCHEMA SYSIBM VARCHAR 128 0 No TABNAME SYSIBM VARCHAR 128 0 No TABLE_ROWS_READ_PER_MINUTE SYSIBM DECIMAL 18 2 Yes
Rows Written per minute by Table
[DOWNLOAD VIEW DDL]
Description: By table, rows written per minute per snapshot interval.
Table Describe:
describe table dba.v_table_rows_written_perminute Column Type Type name schema name Length Scale Nulls ------------------------------ --------- ------------------ -------- ----- ------ SNAPSHOT_TIMESTAMP SYSIBM TIMESTAMP 10 0 No TABSCHEMA SYSIBM VARCHAR 128 0 No TABNAME SYSIBM VARCHAR 128 0 No TABLE_ROWS_WRITTEN_PER_MINUTE SYSIBM DECIMAL 21 2 Yes 4 record(s) selected.
Prune snapshot historical tables
I recommend you create the performance tables in their own DMS tablespace, so that growth in these tables does not endanger space requirements for production data. In any case, you will need to prune data, probably on a weekly basis. The following script can be used as a template for your own prune process. Remember to also REORG / RUNSTATS tables periodically.
example crontab entry, for database dbname, number of days old before purging: 90; run on Monday morning after midnight.
01 00 * * 1 /db2/support/scripts/purge_dba_snapshot_tables.ksh -d dbname -n 90
Download the snapshot history prune script here:
purge_dba_snapshot_tables.ksh
Or view it inline…
inline script:
#!/bin/ksh
# purge_dba_snapshot_tables.ksh
# purge DBA schema snapshot capture tables
#=========================================================================
# Parameters:
# -d : Required. The name of the database against which the SQL will
# run.
# -n : Number of days back to prune
#=========================================================================
scriptStatusLog=/db2/support/scripts/purge_dba_snapshot_tables.log
# edit prfile to your own local configuration.
prfile=/home/db2/sqllib/db2profile
db2options="-av"
function Usage {
echo 'Usage: purge_dba_snapshot_tables.ksh -d -n '
echo 'All parameters are required. '
echo 'Use this script to purge snapshot table data older than n days.'
}
if [[ "$#" -lt "4" ]]; then
Usage
exit
fi
while getopts d:n: OPT
do
case $OPT in
d) DbName=$OPTARG ;;
n) PurgeDays=$OPTARG ;;
*) Usage; echo "$(date) Usage error." >> $scriptStatusLog ; exit 1 ;;
esac
done
if [ -f $prfile ]; then
. $prfile
else
echo "$(date) Cannot locate $prfile." >> $scriptStatusLog
Usage
exit 1
fi
echo "$(date) Now executing purge_dba_snapshot_tables.ksh with following options:" >> $scriptStatusLog
echo "DbName = $DbName " >> $scriptStatusLog
echo "Purge Days = $PurgeDays " >> $scriptStatusLog
echo "$(date) Now connecting . . . " >> $scriptStatusLog
db2 connect to $DbName >> $scriptStatusLog 2>&1
echo "$(date) Now executing purge . . . " >> $scriptStatusLog
db2 $db2options "delete from dba.snapdb where snapshot_timestamp < current timestamp - $PurgeDays days" >> $scriptStatusLog 2>&1
db2 $db2options "delete from dba.snapappl where snapshot_timestamp < current timestamp - $PurgeDays days" >> $scriptStatusLog 2>&1
db2 $db2options "delete from dba.snapstmt where snapshot_timestamp < current timestamp - $PurgeDays days" >> $scriptStatusLog 2>&1
echo "$(date) purge_dba_snapshot_tables completed with RC = $? " >> $scriptStatusLog
db2 connect reset > /dev/null 2>&1
db2 terminate > /dev/null 2>&1
References
Some good articles on use of snapshot data for performance analysis:
- IBM DB2 v9 InfoCenter Key References
- All Supported Administrative SQL Routines and ViewsProbably the most succinct/complete list of DB2 administrative views.
- Snapshot monitor SQL Administrative Views, Administrative Convenience Views” (derivative calculated values)
- Database system monitor data organization
Most useful overview of the organization of data gathered by snapshots. - Administrative views versus table functions
When to still use table functions . . . they still have a role.
- IBM DB2 v9 InfoCenter Cool Administrative Convenience Views
- TOP_DYNAMIC_SQL administrative view
returns the top dynamic SQL statements sortable by number of executions, average execution time, number of sorts, or sorts per statement. - TBSP_UTILIZATION administrative view
returns table space configuration and utilization information. A quick peek at page/space consumption.
- TOP_DYNAMIC_SQL administrative view
- Helpful Scott Hayes articles from IBMDatabaseMagazine.com
- DB2 LUW Performance: Asked and Answered (BPLRTX)
More on BPLRTX, buffer pool logical reads per transaction. Also NUMBLOCKPAGES. Suggesting a guideline for ideal BPLRTX. - DB2 LUW Performance: The Most Important Cost
Buffer Pool Logical Reads per Transaction. - DB2 LUW Performance: Progress Review plus Closing Files
Another list of performance measures, a good index page to search further. More transaction based references. - DB2 LUW Performance: Detecting Index Leaf Page Scans
Index Leaf Page scans, BPLITX. “While perhaps not as costly as data page scans, index leaf page scans can quickly suck your CPUs dry of processing capacity.” - DB2 LUW Performance: Index Read Efficiency (IREF)
IREF = Rows read / Rows Selected (Fetched) - DB2 LUW Performance: Table Read I/O and Overflows
“TBRRTX should be computed for each table by dividing the number of Rows Read by the number of database Transactions . . . TBROVP – The percentage of Overflows over Rows Read = Overflows * 100 / Rows Read (+1 if you want to void division by zero).” - DB2 LUW Performance: Catalog Cache
“Catalog Cache Hit Ratio (CATHR) = 100 – ( Catalog cache inserts * 100 / Catalog cache lookups ) ” A metric that didn’t interest me in my current site, but may be useful to you. - DB2 LUW Performance: Synchronous Read Percent (SRP)
“Synchronous Read Percent – sb 90% or better for OLTP, the lower the number the more you need re-examine your tuning decisions. For DW, 50% might be great.” - DB2 LUW Performance: Write I/O Optimization
Asynchronous Write Percentage - DB2 LUW Performance: Tuning LOGBUFSZ – DB2 for LUW Performance
- DB2 LUW Performance: Asked and Answered (BPLRTX)
- Useful metrics docs from Quest Software
- Understanding Buffer Pool Performance and Tuning in DB2 UDB v8.1 Multi-platform Environments (PDF) By Scott Olson and Joshua Steffan
- Understanding Buffer Pool Performance and Tuning in DB2 UDB LUW 8.2 (PDF), by Jeff Brokaw.
The various bufferpool ratio equations can be found on page 28 onward of the PDF. Data and index hit ratios, physical IO read rate, prefetch ratio, log cleans ratio, threshold cleans, and victim cleans ratios.
