Writing the River: File Naming Standards and Image Access Metadata


Files for the Writing the River (WTR) project will be given unique names as described in this document. The collection name will appear in Digital Library Extension Service (DLXS) software as "wtr"

For wtr, each collection entity or work will be assigned its own unique directory, named using the collection code and an consecutive accession number, e.g. wtr00001, wtr00002, …etc. Then, each of these entity directories will hold associated image files, OCR text files, and PDF files for applicable pages, as shown in the following examples:




Explanation: “wtr00001” and “wtr00002 are the IDs associated with two books of poetry. 000001.tif is the image file for numbered page 1 of the first book, 000001.pdf is the PDF file for numbered page 1 of the first book, and 000001.txt is the OCR file for numbered page one of the first book. Each entity directory will also hold the actual SGML files well.

Links to these files are achieved in DLXS by way of the SGML Page Break Element and Pageviewer-idx which will be built from a FileMaker Pro database including the following fields:

REF: file name of page image
SEQ: the sequence number of the page in the series, from start to finish, of all the pages in the document.
RES: the resolution of the page image.
FMT: the file format of the page image.
FTR: the feature of the page, given as a 3-letter code. Examples: BIB for bibliography,
CTP for Cover Title Page, etc. See full list in DLXS documentation.
CNF: the confident value of the OCR for the page
N: the page number, not as a sequence, but rather the number as printed on the actual page. This attribute cannot be omitted.
- A. Lim

