Skip to content

DBA Holistic Scheduler Consciousness

26-Dec-08

No, this isn’t a tutorial on transcendental meditation, though it takes certain Zen-like qualities. Being good at any job means knowing what to look for – an artist sees one thing, a photographer another. There’s a vast amount of information available, and we filter for what is important to what we do.

Every production DB2 environment may include scheduled processes which cause heavy resource contention and/or heavy logging. The processes may be DB2 utilities: backup, reorg or runstats. The DBA may have a task to prunes tables, with resulting heavy locking and logging. The production application may have a variety of heavy updates or long-running scheduled SQL.

Scheduling may either be via cron or a third party tool such as BMC’s Control-M. Your database backups may be handled via a third party tool such as Symantec NetBackup. Your production application may have a homegrown scheduler. Some scheduled processes may be undocumented due to an administrator leaving the shop, or the shop inheriting an undocumented system.

Scheduled, high contention processes can enter into conflict because of:

  • Poor team communication. For example, you may have a NetBackup administrator, a Control-M admin, an application administrator, and cron processes, all managed by different people who don’t actively communicate changes across the enterprise. Or, people are laid off and do not transition their work to the new administrator.
  • Lack of process schedule review by administrators – not reviewing all active scheduler interfaces on a regular basis or before special database maintenance. This can easily happen when the DBA is on an impending deadline, and when the organization has a complex process schedule with several heterogenous front ends.
  • Poor visibility of schedule utilities, meaning:
    • The DBA is not allowed access to the scheduler interface.
    • DBA is unable to contact scheduler administrators.
    • Organizational spreadsheets are out of date, or there is no enterprise schedule documentation.
    • Too many schedulers are in use by your company.
    • Production application may be an undocumented black box, so developers and admins don’t know which processes are scheduled.
    • Schedules are so complex that they are not easily reconciled.

A good database administrator develops an awareness of the large number of scheduled processes running day and night that impact the database, inside her team’s purview but possibly cutting into the jurisdiction of other departments. To gain this awareness, the DBA in a large organization will require both technical skill and social/political savvy. As you can see, database processes can be initiated by application teams, by system administrators and by DBA’s, not all of whom necessarily are talking.

Lack of awareness of scheduled processes can cause particular pain when a production database DDL update is underway. Backup, reorg or runstats can stall any ALTER TABLE statements. Some ALTER TABLE statements require that a REORG be done afterwards to make the table accessible; if other utilities such as online backup are running, a REORG may not be possible until the other utility is complete.

It is important that the database administrator have a proactive personality and seek out all enterprise scheduler administrators. The DBA should reach out to other admins first, not wait for others to push the information to them.  You should push all your special database maintenance tasks and schedule changes to ALL system administrators whose work impacts your database. You should give yourself at least 48 hours before doing special database maintenance or implementing a new schedule in order to give other administrators a chance to give you feedback.

Documentation: If you do not have a unified scheduler your entire administration staff should have a single Wiki or other collaborative database for documenting database processes. That wiki or spreadsheet needs to be updated regularly. This can be difficult if your organizational culture does not reward documentation work. You have to be a leader.

Aggressive Research: If you have a historical performance monitoring system, study of your resulting graphs may identify heavy update tasks or even utility processes that you were unaware of. In a large organization, it is wise to be open to the possibility that there are “schedule zombies”: undocumented, unmaintained and (possibly) unnecessary processes. This is especially true if you have inherited an undocumented system; in that case the only way you may understand what is running behind the front end is by study of historical database performance through a historical performance monitoring system.

Suspend AND Re-Activate. When scheduled processes need to be suspended for special database maintenance, remember to reactivate the process afterward. Yes this is common sense, but it is very easy to forget. Every suspension of process needs to be accompanied by a reactivation of scheduled process in your implementation plan or hour by hour task list.

Good luck and may the Force be with you!

Unix Housekeeping for the DB2 DBA

19-Dec-08

Active Unix DB2 databases generate and store substantial support information outside of the database itself, in flat text files and binary dumps. If as a Unix DBA you do your administration work largely via Unix shell session, you probably have files comprising SQL, data exports, scripts, generated reports and performance monitoring output which need maintenance. We must, in the course of our work, monitor, maintain and prune those files. File maintenance, in my experience, is one of the most commonly neglected parts of a DBA’s work, often because it is not explicitly delegated; we all assume someone else is going to do the work. DB2-related Unix directories comprise part of the collective memory of the DB2 and are valuable as history, documentation, training, team communication and for the reuse of components.

Inattention to Unix file cleanup may result in:

  • Potential disk full condition.
  • Violation of other Unix limits, for example ARG_MAX (of limits.h) which can cause shell commands to error, making maintenance a bigger headache.
  • Clutter, making it difficult to find your more important reports and other files.
  • More difficulty browsing and editing files that have grown too large.
  • Difficulty grep’ing, running awk etc. or browsing time-ordered data in logs that have grown too large.
  • Danger that important files may be deleted if emergency maintenance is required.
  • Overgrown support directories make directory contents undecipherable. This can make team communication difficult in using and managing this resource. This affects both on-call turnover and training of new DBA’s, who again may delete files that should not be removed.
  • If you have to run the db2support utility, lack of maintenance in the db2dump directory may create an excessively large db2support.zip file which IBM engineers will find difficult to parse.

Maintenance of many of these files is often the subject of internal shop standards, so be sure you familiarize yourself with and follow the advice of your own database operations department before implementing any of my advice. Retention of old logs may also be mandated by legal guidelines – again check with your management.


DB2-generated files

Regardless of whether you manage your databases remotely or locally on the server, DB2 itself will generate a number of files during normal operation, and the administrator is responsible for maintaining and pruning those files. The majority of those files will be located in directory [db2home]/sqllib/db2dump . An explanation of the various files found in the db2dump directory can be found here.

If your databases are sufficiently active you should be renaming administration notification logs and DB2 diagnostic logs every day around midnight (for example mv db2diag.log db2diag.log.20081101 ) to make managing the diagnostic output easier (for less activity this can also be done on a weekly or monthly basis). This is typically done by a command or script run via cron. I suggest renaming to [filename].YYYYMMDD. Splitting your files in this manner will make research easier. When searching for old diagnostic data, you can easily grep a specific range of dates (for example

grep –i lock db2diag.log.2008110?

) These files can be compressed and uncompressed by wildcards; it is then a simple matter to uncompress the few days of diagnostic logs you need, again using wildcards, then grep on a wildcard to restrict the days you are searching. Contrast that with the difficulty of using a single very large diagnostic log spanning weeks of activity.

Your two main candidates for regular renames and cron-scheduled maintenance are db2diag.log and the administration notification log [instance name].nfy . If you are running DB2 administration server (DAS) you will see DAS diagnostic logs ( db2dasdiag.log ) which should be scheduled for the same sort of maintenance.

Once you have a daily maintenance strategy in place, you will need to maintain those log files long-term. You may prune (delete) old files, move them to another archive directory, archive them to another device or download them to your laptop or workstation; or any combination of the above. Whatever you do, keep the db2dump clear of clutter. This does not require deletion of files. It can be useful to retain your db2diag.log and administration notification log indefinitely for research. If archival is the plan, a separate archive directory structure should be created, perhaps with subdirectories hierarchically organized by month and year. Do not retain old logs in the db2dump directory. If you don’t delete, move.

Other files that will be generated by DB2 but with irregular filename, directory location and frequency are:

  • callout script (db2cos) files, dump, trap and core files.
  • DB2 Event Logs db2eventlog.xxx (xxx = 000, 001, for multiple partitions)
  • stmmlog directory, which contains Self-Tuning Memory Manager logs. You may want to compress, archive or prune these logs.
  • If you have opened a PMR with IBM, you may also have db2support output (db2support*zip) in your db2dump directory. These files can be quite large, and should be stored after creation somewhere other than in db2dump.

As with your db2diag.log, these files eventually should be either deleted or moved to an archive directory or another device. It is probably not something you will want to automate since many of these files demand immediate human attention and decision. If you decide not to open a PMR with IBM about a DB2 problem related to generated dump, trap and core files, they may be deleted after whatever interval your shop considers safe. Logs, callout script files, dump, trap and core files associated with an open PMR should probably be maintained as long as the PMR is open along with the db2support file for the PMR.

When DB2 generates these files it is usually indicative of a problem, so the db2dump directory should be scanned routinely for new files just as the db2diag.log should be browsed regularly. This can be done either manually or via cron scripts with email notification.

DBA Support Files

The DBA who works mostly via shell sessions and command line utilities will probably store many files related to her work on the DB2 server.  It is important that the DB2 support directory be secured via Unix permissions from access by developers and users outside of the DBA team. Non-DBA’s should not be able to remove any file or directory, or write new files or subdirectories under your hierarchy. In an organization with multiple database servers, it can be advantageous to create at least some of the DBA support directories on a shared volume to make sharing and maintenance of files easier.

This is a typical directory structure for DB2 support files; following each directory name is the typical kind of data you might find there (identified by a common file suffix).

  • /db2/support/data
    • *.idx – IDX data export
    • *.del – Delimited data export
    • *.msg – IMPORT/EXPORT/LOAD message files
  • /db2/support/ibm_db2support_files
  • /db2/support/scripts
    • *.lst – Lists used as script drivers
    • *.ksh – Korn shell script
  • /db2/support/ddl
    • *.ddl – Database Definition Language statements, for example CREATE TABLE
    • *.cfg – Configuration statements, for example UPDATE DBM CFG
  • /db2/support/logs
  • /db2/support/sql
    • *.sql – SQL, queries and updates
  • /db2/support/evmon
    • *.evt – event monitor data files
    • *.ctl – event monitor data file
  • /db2/support/output
    • *.out – report output

Any directory is also likely to contain:

  • *.gz, *.Z – compressed files
  • *.txt – Text, documentation, README
  • *.jnk, *.tmp – Whenever I create scratch files that I know can be deleted immediately, such as intermediate pipe output, I name them “scratch” “junk” or anything with a .jnk or .tmp suffix. That makes finding these files and deleting them very easy.

IDENTIFY OBSOLETE. Determine whether reports and scripts are obsolete; to make it easier to find active project files, create an “obsolete” subdirectory wherever appropriate, and move files you have tagged as inactive versions there.

DATA TYPE SHOULD MATCH DIRECTORY. You will want to make sure that files are in the appropriate directory to make management easier – for example move report output .out files to the output directory. It is very easy to clutter frequently used directories such as scripts, sql and ddl with output and data export files resulting from your work.

You can move files by extension very easily to the correct directory by a command similar to this:

find /db2/support -name "*.ixf*" |  grep -v "/db2/support/data" |  xargs -i mv -i {} /db2/support/data

FOCUS ON LARGEST FILES. Between cleanups large amounts of space can be freed by compressing your typically larger files throughout the entire directory tree – for example table export files and report output. This can be done via variations of this command:

find /db2 –name “*.out” | xargs gzip

To find files that are not compressed:

ls * | grep -v "gz"

When doing cleanup, you may want to begin by identifying the largest files in your support directory – then either compress or remove them. There are several expressions that can help with this process, these will all list files in descending filesize order:

If you don’t want to see subdirectory individual file contents:

du -k * | sort -rn | grep -v "/" | less

If you do want to see size of files in subdirectories:

du -k * | sort -rn | less

RMDIR EMPTY DIRECTORIES: Remove directories that are empty and which are not working directories referenced by scripts.

KEEP AN EYE ON PERMISSIONS. If there is more than one person involved in maintaining a server directory, problems can quickly emerge with clashing file ownership and permissions. This is especially true if system administrators log on as root and create files during those sessions. This can easily result in unbrowsable directories and files nobody is able to read, edit, move or delete.

Do files have the correct ownership? If you have defined a file as 600 and own it under your personal account but in a shared project directory, other DBA’s may be unable to read/write or delete the file. Make sure shared DBA project files have the correct permissions, whether 777, 666, 660, etc. depending on how your Unix groups are used to define the team.

Look for files owned by root and have the root authority users in your shop change owner or make r/w/x as appropriate to the DBA Unix group.

Use Unix permissions as a safety net to protect important files from being accidentally overwritten, e.g.

chmod a-w configuration.dat

DOWNLOAD YOUR IMPORTANT SCRIPTS, SQL, DDL. For safety download your scripts, SQL, and DDL periodically to your laptop or some other location. Even if the server has been backed up to tape, system administrators may have difficulty doing a tape restore should you accidentally delete just a single file or directory. Your company should have a sound backup strategy, but even safer than nightly backups is an occasional additional backup made by DBA to media that the DBA, not another department, controls. Remember to secure this media.

THE PURGE

During cleanup of DBA directories, you will need to make decisions about what files need to be retained permanently, which can be pruned and after how long. As with DB2 generated files, you should consult your organization’s retention guidelines. When deleting files based on age, first review the list that your pruning script or command would affect without actually deleting the files, by substituting the ‘ls’ command for the ‘rm’. Possibly write that output to a file for permanent storage, giving you a record of what was there before your cleanup began. Review that document and revise your plan. Only then do the actual delete. Resist the temptation to rm using wildcards.

To review files older than 90 days, something like this would work:

find /db2/support/data -mtime +90 | xargs ls -ld |  less

The specific approach that I follow to prune safely goes something like this:

  1. find . -name "*.out*" -mtime +60 > purge_files.20081124.txt
  2. now edit file purge_files.20081124.txt
  3. confirm you’re happy with your purge decisions, then . . .
  4. cat purge_files.20081124.txt | xargs rm

Some other suggestions come to mind:

POSSIBLY KEEP FOREVER. Some table backups, such as table representing system-level application configuration data, might never be deleted. You might want to keep deltas of database configuration changes and db2look output throughout the lifetime of the database.

BE WARY OF THE DATABASE DIRECTORY. Be extremely careful whenever you are working in the database directory. An errant recursive rm could wreak major havoc on your database. Keeping clear of the database directory when doing maintenance is one of the reasons for having a dedicated and separate DBA support directory tree.

DATABASE BACKUP IMAGES.You may maintain database backup images on disk. If that is true, add maintenance of these files to your list of chores, and be extremely careful in their management.

LOAD/EXPORT/IMPORT MESSAGE FILES. Once .msg output has been reviewed for errors after completion of the LOAD/EXPORT/IMPORT task, that .msg output can usually be deleted.

EVENT MONITOR. Remove old event monitor stream files and evmon formatted output.

YOUR OWN SCRIPT OUTPUT. Are there any large and growing logs from your own monitoring processes that should be pruned? Or, renamed and compressed?

/TMP. Clean up the /tmp directory and any other directories you use for scratch file storage.

 

 

Good luck and prune safely!