Each catalog that has any manually-created initial data (some do not)
   has a corresponding .dat file that contains its
   initial data in an editable format.
  
    Each .dat file contains Perl data structure literals
    that are simply eval'd to produce an in-memory data structure consisting
    of an array of hash references, one per catalog row.
    A slightly modified excerpt from pg_database.dat
    will demonstrate the key features:
   
[
# A comment could appear here.
{ oid => '1', oid_symbol => 'TemplateDbOid',
  descr => 'database\'s default template',
  datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING',
  datcollate => 'LC_COLLATE', datctype => 'LC_CTYPE', datistemplate => 't',
  datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0',
  datfrozenxid => '0', datminmxid => '1', dattablespace => '1663',
  datacl => '_null_' },
]
Points to note:
The overall file layout is: open square bracket, one or more sets of curly braces each of which represents a catalog row, close square bracket. Write a comma after each closing curly brace.
      Within each catalog row, write comma-separated
      key =>
      value pairs.  The
      allowed keys are the names of the catalog's
      columns, plus the metadata keys oid,
      oid_symbol, and descr.
      (The use of oid and oid_symbol
      is described in Section 70.2.2
      below.  descr supplies a description string for
      the object, which will be inserted
      into pg_description
      or pg_shdescription as appropriate.)
      While the metadata keys are optional, the catalog's defined columns
      must all be provided, except when the catalog's .h
      file specifies a default value for the column.
     
      All values must be single-quoted.  Escape single quotes used within a
      value with a backslash.  Backslashes meant as data can, but need not,
      be doubled; this follows Perl's rules for simple quoted literals.
      Note that backslashes appearing as data will be treated as escapes by
      the bootstrap scanner, according to the same rules as for escape string
      constants (see Section 4.1.2.2); for
      example \t converts to a tab character.  If you
      actually want a backslash in the final value, you will need to write
      four of them: Perl strips two, leaving \\ for the
      bootstrap scanner to see.
     
      Null values are represented by _null_.
      (Note that there is no way to create a value that is just that
      string.)
     
      Comments are preceded by #, and must be on their
      own lines.
     
To aid readability, field values that are OIDs of other catalog entries can be represented by names rather than numeric OIDs. This is described in Section 70.2.3 below.
      Since hashes are unordered data structures, field order and line
      layout aren't semantically significant.  However, to maintain a
      consistent appearance, we set a few rules that are applied by the
      formatting script reformat_dat_file.pl:
      
         Within each pair of curly braces, the metadata
         fields oid, oid_symbol,
         and descr (if present) come first, in that
         order, then the catalog's own fields appear in their defined order.
        
Newlines are inserted between fields as needed to limit line length to 80 characters, if possible. A newline is also inserted between the metadata fields and the regular fields.
         If the catalog's .h file specifies a default
         value for a column, and a data entry has that same
         value, reformat_dat_file.pl will omit it from
         the data file.  This keeps the data representation compact.
        
         reformat_dat_file.pl preserves blank lines
         and comment lines as-is.
        
      It's recommended to run reformat_dat_file.pl
      before submitting catalog data patches.  For convenience, you can
      simply change to src/include/catalog/ and
      run make reformat-dat-files.
     
      If you want to add a new method of making the data representation
      smaller, you must implement it
      in reformat_dat_file.pl and also
      teach Catalog::ParseData() how to expand the
      data back into the full representation.
     
    A catalog row appearing in the initial data can be given a
    manually-assigned OID by writing an oid
    =>  metadata field.
    Furthermore, if an OID is assigned, a C macro for that OID can be
    created by writing an nnnnoid_symbol
    =>  metadata field.
   name
    Pre-loaded catalog rows must have preassigned OIDs if there are OID
    references to them in other pre-loaded rows.  A preassigned OID is
    also needed if the row's OID must be referenced from C code.
    If neither case applies, the oid metadata field can
    be omitted, in which case the bootstrap code assigns an OID
    automatically, or leaves it zero in a catalog that has no OIDs.
    In practice we usually preassign OIDs for all or none of the pre-loaded
    rows in a given catalog, even if only some of them are actually
    cross-referenced.
   
    Writing the actual numeric value of any OID in C code is considered
    very bad form; always use a macro, instead.  Direct references
    to pg_proc OIDs are common enough that there's
    a special mechanism to create the necessary macros automatically;
    see src/backend/utils/Gen_fmgrtab.pl.  Similarly
    — but, for historical reasons, not done the same way —
    there's an automatic method for creating macros
    for pg_type
    OIDs.  oid_symbol entries are therefore not
    necessary in those two catalogs.  Likewise, macros for
    the pg_class OIDs of system catalogs and
    indexes are set up automatically.  For all other system catalogs, you
    have to manually specify any macros you need
    via oid_symbol entries.
   
    To find an available OID for a new pre-loaded row, run the
    script src/include/catalog/unused_oids.
    It prints inclusive ranges of unused OIDs (e.g., the output
    line “45-900” means OIDs 45 through 900 have not been
    allocated yet).  Currently, OIDs 1-9999 are reserved for manual
    assignment; the unused_oids script simply looks
    through the catalog headers and .dat files
    to see which ones do not appear.  You can also use
    the duplicate_oids script to check for mistakes.
    (genbki.pl will also detect duplicate OIDs
    at compile time.)
   
    The OID counter starts at 10000 at the beginning of a bootstrap run.
    If a catalog row is in a table that requires OIDs, but no OID was
    preassigned by an oid field, then it will
    receive an OID of 10000 or above.
   
    Cross-references from one initial catalog row to another can be written
    by just writing the preassigned OID of the referenced row.  But
    that's error-prone and hard to understand, so for frequently-referenced
    catalogs, genbki.pl provides mechanisms to write
    symbolic references instead.  Currently this is possible for references
    to access methods, functions, operators, opclasses, opfamilies, and
    types.  The rules are as follows:
   
      Use of symbolic references is enabled in a particular catalog column
      by attaching BKI_LOOKUP(
      to the column's definition, where lookuprule)lookuprule
      is pg_am, pg_proc,
      pg_operator, pg_opclass,
      pg_opfamily, or pg_type.
      BKI_LOOKUP can be attached to columns of
      type Oid, regproc, oidvector,
      or Oid[]; in the latter two cases it implies performing a
      lookup on each element of the array.
     
      In such a column, all entries must use the symbolic format except
      when writing 0 for InvalidOid.  (If the column is
      declared regproc, you can optionally
      write - instead of 0.)
      genbki.pl will warn about unrecognized names.
     
      Access methods are just represented by their names, as are types.
      Type names must match the referenced pg_type
      entry's typname; you do not get to use any
      aliases such as integer
      for int4.
     
      A function can be represented by
      its proname, if that is unique among
      the pg_proc.dat entries (this works like regproc
      input).  Otherwise, write it
      as proname(argtypename,argtypename,...),
      like regprocedure.  The argument type names must be spelled exactly as
      they are in the pg_proc.dat entry's
      proargtypes field.  Do not insert any
      spaces.
     
      Operators are represented
      by oprname(lefttype,righttype),
      writing the type names exactly as they appear in
      the pg_operator.dat
      entry's oprleft
      and oprright fields.
      (Write 0 for the omitted operand of a unary
      operator.)
     
      The names of opclasses and opfamilies are only unique within an
      access method, so they are represented
      by access_method_name/object_name.
     
In none of these cases is there any provision for schema-qualification; all objects created during bootstrap are expected to be in the pg_catalog schema.
    genbki.pl resolves all symbolic references while it
    runs, and puts simple numeric OIDs into the emitted BKI file.  There is
    therefore no need for the bootstrap backend to deal with symbolic
    references.
   
Here are some suggestions about the easiest ways to perform common tasks when updating catalog data files.
Add a new column with a default to a catalog: 
     Add the column to the header file with
     a BKI_DEFAULT(
     annotation.  The data file need only be adjusted by adding the field
     in existing rows where a non-default value is needed.
    value)
Add a default value to an existing column that doesn't have
     one: 
     Add a BKI_DEFAULT annotation to the header file,
     then run make reformat-dat-files to remove
     now-redundant field entries.
    
Remove a column, whether it has a default or not: 
     Remove the column from the header, then run make
     reformat-dat-files to remove now-useless field entries.
    
Change or remove an existing default value: 
     You cannot simply change the header file, since that will cause the
     current data to be interpreted incorrectly.  First run make
     expand-dat-files to rewrite the data files with all
     default values inserted explicitly, then change or remove
     the BKI_DEFAULT annotation, then run make
     reformat-dat-files to remove superfluous fields again.
    
Ad-hoc bulk editing: 
     reformat_dat_file.pl can be adapted to perform
     many kinds of bulk changes.  Look for its block comments showing where
     one-off code can be inserted.  In the following example, we are going
     to consolidate two boolean fields in pg_proc
     into a char field:
     
        Add the new column, with a default,
        to pg_proc.h:
+ /* see PROKIND_ categories below */ + char prokind BKI_DEFAULT(f);
        Create a new script based on reformat_dat_file.pl
        to insert appropriate values on-the-fly:
-           # At this point we have the full row in memory as a hash
-           # and can do any operations we want. As written, it only
-           # removes default values, but this script can be adapted to
-           # do one-off bulk-editing.
+           # One-off change to migrate to prokind
+           # Default has already been filled in by now, so change to other
+           # values as appropriate
+           if ($values{proisagg} eq 't')
+           {
+               $values{prokind} = 'a';
+           }
+           elsif ($values{proiswindow} eq 't')
+           {
+               $values{prokind} = 'w';
+           }
Run the new script:
$ cd src/include/catalog $ perl rewrite_dat_with_prokind.pl pg_proc.dat
        At this point pg_proc.dat has all three
        columns, prokind,
        proisagg,
        and proiswindow, though they will appear
        only in rows where they have non-default values.
       
        Remove the old columns from pg_proc.h:
- /* is it an aggregate? */ - bool proisagg BKI_DEFAULT(f); - - /* is it a window function? */ - bool proiswindow BKI_DEFAULT(f);
        Finally, run make reformat-dat-files to remove
        the useless old entries from pg_proc.dat.
       
     For further examples of scripts used for bulk editing, see
     convert_oid2name.pl
     and remove_pg_type_oid_symbols.pl attached to this
     message:
     https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com