Project Information - Linus Pauling Research Notebooks - Special Collections & Archives Research Center

Project Information

This site was originally launched in February 2002.

Scanning

Pauling's research notebooks were scanned using an Epson GT-10000+ flatbed scanner. This scanner can accomodate any size up to 11"x17". The research notebooks were larger than a standard 8.5"x11" flatbed scanner. The pages were scanned into Epson's TWAIN Scanning Software at 24-bit color and 150 dpi and then sent to Adobe Photoshop, where they were saved as TIF files. Due to the way the notebooks fit onto the scanner, the majority of the scans had to be rotated either 90 degrees or 180 degrees. This issue was handled by using a batch conversion module of the freeware graphics program called Irfanview.

Generating JPG Files

From each TIF file, two separate JPG files were created for display on the Web. One file is for regular viewing and the other is enlarged for detailed viewing. Irfanview's batch conversion module was used again to generate JPG files from the TIF files of the scans. Processing one notebook at a time, the first JPG files were generated with a width of 1000 pixels, and then were renamed to reflect that these were the larger of the two sizes. The second batch of JPG files was then generated with a width of 500 pixels.

Indexing

Each notebook was thoroughly reviewed by a staff member and each page noted with relevant subjects. A list of pages and their corresponding subjects was then compiled for each notebook into a content index, similar to a table of contents. This provides the main organization for the site. An alphabetical subject index was also created by student assistants and staff. This organized all parts of a subject together and provided links directly to the corresponding pages in notebooks that were relevant to the subjects.

Page Generation

Pages were generated using XML for the source files and HTML as the final product. All of the index files were originally in WordPerfect 9. The content indexes were broken down by notebook; The alphabetical subject index was broken down by letter. To preserve italics, superscripts and subscripts, the WordPerfect files were converted internally using WordPerfect's Internet Publisher and then saved as HTML files. These files were then opened into a freeware HTML editor, HTML-Kit. Using plugin scripts that were written for HTML-Kit, the HTML files produced from WordPerfect were first cleaned up and then formatted with appropriate XML tags. Some manual tagging was still needed. Now the files were in XML format and ready to be parsed using XSL stylesheets to produce HTML files. Using Saxon (previously XT), an XSLT processor, the content index files in XML, and an XSL stylesheet that was custom designed in-house, the content index HTML pages were created as well as both HTML pages needed for each research notebook page. All of the pages are inter-linked together and the same subject data that is available on the content index page is also available on each corresponding page when it is accessed. The alphabetical subject index pages were handled in a very similar way using a different set of plugin files for HTML-Kit and a different XSL stylesheet.

Final Presentation

After the image and file processing was complete, the JPG files and the HTML files were uploaded to our web server. A single Cascading Style Sheet (CSS) file is used for consistent formatting across all of the HTML files. Navigation begins with the content index or alphabetical subject index. "Next Page" and "Previous Page" links are available for navigating from page to page and a "Go to Page" box is also available that allows for jumping to any single page within any of the notebooks. Larger images of the scanned graphics are easily available by clicking on the regular sized image or clicking on the "Enlarged View" link below the image. The header graphic was designed by a student in Adobe Illustrator 9 using an original photograph from the Ava Helen and Linus Pauling Papers [Photographs, 1985i.33].

Statistics

Number of Notebooks: 47
Total Number of Scanned Pages (TIFs): 7,680
Total Number of Page Images (JPGs): 15,360
Total Number of HTML files: 15,400

Return to List of Research Notebooks