Name
pg_batch -- run SQL jobs in parallel
Synopsis
pg_batch [OPTIONS] FILENAME [args...]
- FILENAME [args...]
- Specify the script filename to list jobs.
Parameters for the queries in the script have parameters can be passed with args.
If FILENAME is -, the program reads script from standard input.
The following parameters are available for OPTIONS.
See also Options for detail.
  - Job options
    - -j, --jobs=CONNECTIONS : number of worker threads
- -t, --timeout=TIMEOUT : timeout in seconds
 
- Connection Options
    - -a, --all : process all databases
- -d, --dbname=DBNAME : database to connect
- -h, --host=HOSTNAME : database server host or socket directory
- -p, --port=PORT : database server port
- -U, --username=USERNAME : user name to connect as
- -W, --password : force password prompt
 
- Generic Options
    - -e, --echo : echo queries
- -E, --elevel=LEVEL : set output message level
- --help : show the help, then exit
- --version : output version information, then exit
 
Description
pg_batch executes SQL jobs in PostgreSQL.
It runs a SQL script that lists SQL jobs first, then runs the result SQLs in serial or parallel.
The features in pg_batch:
	- Runs a specified job-listing script, and generates a list of jobs.
- Runs each job in the list in separated transactions.
- Jobs are executed in parallel with the specified number of sessions.
- Writes queries, start and stop times, succeeded or failed to console.
- If timeout is specified, cancel remaining jobs after timeout.
As an useful example, "A script that lists VACUUM commands" is attached.
The script can be used instead of vacuumdb.
Examples
Run VACUUM jobs for test database.
$ pg_batch -d test $PGSHARE/contrib/pg_batch_vacuum.sql
Run user-jobs.sql for all databases.
$ pg_batch --all user-jobs.sql
Options
The following command line options are available in pg_batch.
Job options
- -j CONNECTIONS
 --jobs=CONNECTIONS
- The number of parallel executions of jobs.
Values in 1 to 32 are available.
The default is 1.
- -t TIMEOUT
 --timeout=TIMEOUT
- 
Specify timeout to cancel remaining jobs in seconds.
0 means no timeout.
The default 1.
Connection Options
Options to connect to servers.
- -a
 --all
- 
Processes all databases.
Jobs will be executed in each database sequentially;
You cannot run inter-database jobs in parallel even if you use -j options.
The same script to list jobs is used from all databases.
- 
-d dbname
 --dbname dbname
- Specifies the name of the database to be processed. If this is not specified and -a (or --all) is not used, the database name is read from the environment variable PGDATABASE. If that is not set, the user name specified for the connection is used. 
- -h host
 --host host
- Specifies the host name of the machine on which the server is running. If the value begins with a slash, it is used as the directory for the Unix domain socket. 
- -p port
 --port port
- Specifies the TCP port or local Unix domain socket file extension on which the server is listening for connections.
- -U username
 --username username
- User name to connect as. 
- -W
 --password
- Force the program to prompt for a password before connecting to a database.
- This option is never essential, since the program will automatically prompt for a password if the server demands password authentication. However, vacuumdb will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt. 
Generic Options
- -e
 --echo
- Echo commands sent to server.
- -E
 --elevel
- Choose the output message level from DEBUG, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC.
The default is INFO.
- --help
- Show usage of the program.
- --version
- Show the version number of the program.
Environments
	- 
		PGDATABASE
 PGHOST
 PGPORT
 PGUSER
- Default connection parameters
This utility, like most other PostgreSQL utilities, also uses the environment variables supported by libpq  (see Environment Variables).
Restrictions
pg_batch has following restrictions and limitations:
- The results of SQL jobs are ignored.
- The status of a job only depends on whether the command was successfully committed.
- A job must be one command; No flow controls.
- If you need flow controls, you can run an user-defined function.
- No resource quotas.
- Resources can be controlled only with sessions (-j) and vacuum_cost_delay.
- Jobs for each database are executed in serial.
- 
Only jobs in one database are executed in parallel with -j option.
If you want to run jobs in multiple database in parallel, run multiple pg_batch processes.
Details
System
pg_batch is a single-threaded program, but uses multiple sessions (-j option) and asynchronous queries to execute SQL jobs in parallel.
 
Job-listing scripts
Job-listing scripts must have a query that returns SETOF (query AS text, priority AS float8).
"query" is a SQL to be executed as a job.
"priority" is a priority of job.
In general, job-listing script will look like:
SELECT query, priority FROM jobs_list WHERE should_be_run_today;
"priority" column can be omitted. If so, priorities of all jobs are 0.
pg_batch executes jobs in descending order of "priority".
In addition, if jobs are canceld by timeout or Ctrl+C,
it raises WARNING for canceled jobs that have priorities equal or higher than 0.
The priority should be chosen accoding to whether the job should be executed or not.
For example, if you run jobs once per day, priorities can be assigned with the following rules:
	- Jobs with zero or positive "priority" : jobs that should be executed today
- Jobs with negative "priority" : jobs that might be executed if there is enough time
Timeout
If timeout is specified, the running job on timeout is canceled and remaining jobs are skipped.
When jobs that has 0 or higher priorities are canceled or skipped, pg_batch writes WARNING messages for such jobs.
Also, the exit code of pg_batch will be non-zero in such cases.
An interruption with Ctrl+C is also treated as same as timeout.
Log messages
Log messages written by pg_batch are explained in the section.
- The number of jobs per database
INFO: DATABASE 'database name' (number of jobs jobs)
 
- Messages when job starts
INFO: [no/total] START: start timestamp (YYYY-MM-DD HH:MI:SS)
INFO: [no/total] QUERY: SQL
 
- Messages when job ends
/* when the job is succeeded */
INFO: [no/total] SUCCESS: end timestamp (YYYY-MM-DD HH:MI:SS) (duration (HH:MM:SS))
/* when the job is failed */
WARNING: [no/total] FAILED: end timestamp (YYYY-MM-DD HH:MI:SS) (duration (HH:MM:SS))
 
- Messages on timeout
/* when no failed jobs */
INFO: TIMEOUT timestamp of timeout (YYYY-MM-DD HH:MI:SS)
/* when some failed jobs */
WARNING: TIMEOUT timestamp of timeout (YYYY-MM-DD HH:MI:SS)
 
- Messages when jobs are skipped
/* when low-priority jobs (priority < 0) */
INFO: [no/total] SKIP: SQL
/* when high-priority jobs (priority >= 0) */
WARNING: [no/total] SKIP: SQL
 
- Error messages
ERROR: error messages
 
Here is 
an example from VACUUM jobs.
Script for VACUUM jobs
The script runs VACUUMs for each table that required to be vacuumed, like autovacuum.
Only tables that have many dead tuples or near to transaction wrap-around will be vacuumed.
The script is better than vacuumdb because it skips tables that doesn't need to be vacuumed and run VACUUMs in parallel.
The script runs VACUUMs aggressively than autovacuum.
It uses the following threshold to determine to run VACUUMs.
See also "The Autovacuum Daemon" for details.
 
Installation
pg_batch can be installed like as other standard contrib modules.
Build
The module can be built with pgxs.
$ cd pg_batch
$ make USE_PGXS=1
$ make USE_PGXS=1 install
Database registration
No registration is required just to use pg_batch.
If you will run "VACUUM jobs", you need to run $PGSHARE/contrib/pg_batch.sql for each table.
$ psql -d dbname -f $PGSHARE/contrib/pg_batch.sql
Requirements
- PostgreSQL version
- PostgreSQL 8.4, 9.0
- OS
- RHEL 5.2, Windows XP SP3
See also
vacuumdb