PDF Association logo.




The only digital document format

What is a “document”?

A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such records included papyrus and vellum, which is basically leather. For a thousand years, more or less, paper has been the media of choice.

Margaret Hamilton standing with a stack of paper.
Margaret Hamilton led a team credited with developing the software for NASA’s Apollo and Skylab. Her  team was responsible for developing in-flight software, which included algorithms designed by various senior scientists for the Apollo command module and lunar lander. The image shows Hamilton in 1969, standing next to the navigation software that she and her MIT team produced for the Apollo project. Credit: NASA / Wikimedia Commons / Public Domain

That began to change in the 1980s.

Digital documents

PDF became the document format of choice for business, government and the general public because it delivers the key qualities of paper in a digital format. PDF is fixed, self-contained, readily shareable and relatively hard to change. It’s not just PDF’s innate characteristics that make it successful, but the fact that PDF pages interoperate smoothly with paper documents. “PDF it, send it, print it, sign it and return it” workflows introduced new efficiencies when the format surfaced into public consciousness in the mid-to-late 1990s. Even then, such workflows utilized only the most basic of PDF’s capabilities, but it was enough to dramatically accelerate the transition to digital documents. Within a few years, PDF files and email decimated document courier services.

Before long, users were scanning the signature page and adding it to (or replacing) the original page in the PDF; the cycle back to a digital document was complete. This new workflow, of course, was an extremely crude approach to facilitating document approvals, but the fact that end-users could do this very easily made PDF very tolerant of variations in workflow and records-keeping practices in a way that’s hard to imagine for databases and HTML.

PDF continues to evolve far beyond a simulacrum for paper. There’s a broad suite of features – tagging, XML-based metadata, attachments, 3D support, digital signatures and more – that support advanced document-handling and consuming workflows. PDF is so capable and so reliable, that some wonder why bother with an archival subset at all.

Documents are for keeping

Not every PDF is designed with reliability in mind. For all its well-deserved reputation for reliably conveying the author’s intent to any viewer, PDF allows developers to make files that rely on external resources, or use encryption; both capabilities are non-starters for the preservation community. If the world preserves PDF files as documents – and it does – then preservationists need PDF/A.

Introduced in 2005 as ISO 19005, PDF/A is now required or best-practice in workflows that generate valuable documents. Filing cabinets and storage boxes are disappearing as ECM systems, cloud storage and local capacity swallow the documents that used to exist only on paper. When new documents are shared, the common-ground is PDF. When finalized for records-retention purposes, ideally, they are PDF/A.

Some think HTML will “beat” PDF because it’s more flexible and less static, but this misconstrues both formats’ respective purposes and fails to appreciate that browser developers are (slowly) augmenting their support for PDF. PDF continues to gain in mind-share: Google’s Trends data shows clearly that the number of searches for PDF documents relative to all other searches continues going up.

PDF’s purpose is to serve in the role of “document”, with all that implies (see above). But that’s not the purpose of HTML. HTML isn’t a document, it’s an experience. PDF is how you keep it, and PDF/A is how you keep it forever.

Preserving the file’s actual bytes, of course, is up to you.

Documents of the future

This is not only the present, it’s also the future. PDF, an open, standardized, broadly-capable digital document technology, has proven equal to the transition from paper to the electronic world. PDF’s advanced metadata, authentication, semantic tagging, attachments, 3D and other features provide a proven framework for future development of digital documents. PDF has no competitors. Even in the world of SharePoint, OpenText, Office 365 and Google Docs, PDF and PDF/A represent the only sufficiently flexible and capable technology for archiving the gamut of digital document content.

(This piece was adapted from a recent blog post)

Categories: Archives & Libraries, PDF/A
Margaret Hamilton

What is a “document”? A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such r …

Share this article!

About the contributor

Duff Johnson

A veteran of the electronic document space, Duff Johnson is an independent consultant, Executive Director of the PDF Association and ISO Project co-Leader (and US TAG chair) for ISO 32000 and ISO 14289.
More contributions
Participating in the PDF Techniques Accessibility Summit

The PDF Techniques Accessibility Summit’s objective is to establish a broad-based understanding of how PDF files should be tagged for accessibilty. It’s an opportunity to focus on establishing a common set of examples of accessible PDF content, and identify best-practice when tagging difficult cases.

Modernizing PDF Techniques for Accessibility

The PDF Techniques Accessibility Summit will identify best-practices in tagging various cases in PDF documents. Questions to be addressed will likely include: the legal ways to tag a nested list, the correct way to caption multiple images, the appropriate way to organize content within headings.

Refried PDF

My hospital emailed me a medical records release form as a PDF. They told me to print it, fill it, sign it, scan it and return it to the medical records department, in that order. In 2018? To get the form via email (i.e., electronically), yet be asked to print it? Did the last 20 years just… not mean anything! So I thought I’d be clever. I’d fill it first, THEN print it. Or better yet, never print it, but sign it anyhow, and return it along with a note making the case for improving their workflow. The story continues…

Slides and video recordings of PDF Days Europe 2018

You missed the PDF Days Europe 2018? Never mind! Here you can find the slides and video recordings of all 32 stunning sessions!

Using PDF/UA in accessibility checklists

PDF/UA, like PDF itself, is internally complex, but used correctly, actually makes things easier.