Some Comments on Accessibility of PDF Format Resources
Barry McMullin
Last Revised: 14 January 2007
Introduction
The Portable Document Format, or PDF, is in widespread use as a format for electronic publication of documents. This note provides some brief and relatively informal comments on the issue of access to such documents for people with disabilities, and, in particular, conformance with the W3C Web Content Accessibility Guidelines (WCAG 1.0)
Advantages of PDF
While there are probably many factors involved in the popularity of PDF as a format for electronic document publications, a key one appears to be its very easy integration with established, print-oriented, publication processes. Thus, virtually all print-oriented document authoring or preparation systems are capable of generating PDF output. Assuming a publication system that is already in operation, producing print documents, then there is virtually zero incremental cost to producing PDF format electronic documents that are essentially identical in visual layout and appearance to the print versions. These can be directly deployed through the web, or other electronic channels as appropriate. While users require appropriate "reader" software to access these documents, such software is available, free of charge, for many popular computer platforms.
From the viewpoint of accessibility for people with disabilities, PDF publication is generally at least a minimal improvement on print-only publication. Print is directly usable with some assistive technologies (such as optical magnifiers); but for many users, it must be transformed into an electronic, digital, form to even have minimal access. Examples of this include users needing speech synthesis, braille display, or cognitive assistance such as dictionary/thesaurus lookup etc. Such "(re-)digitisation" minimally involves scanning and optical character recognition (OCR). Clearly then, the direct availability of PDF versions at least removes the need for scanning, and usually also for OCR. Further, since such (re-)conversion is intrinsically prone to introducing errors, the "quality" of almost any directly produced PDF document will be at least somewhat better than would be achieved by re-digitisation.
"Historical" Problems with PDF Accessibility
In the early years of PDF development, there were two significant problems with its use for people with disabilities:
-
It was sometimes used simply as a "container" format for, effectively, (re-)scanned images of print documents. While this does save a little effort in scanning the print documents, it is of very limited benefit for people needing a genuine ("structural" or even fully "semantic") digital representation. It still needs OCR and the resulting text is subject to all the normal quality problems of the OCR process.
-
Early versions of the PDF format (and PDF reader software) lacked support for specific features needed to properly support a wide variety of users with disability. This was because PDF evolved from the earlier "Postscript" language, and was initially still primarily a "page description" format - i.e., it was optimised for describing visual layout and appearance in print media, rather than being tailored for the sorts of digital, semantic, transformation which are of most benefit to people with disabilities.
Both of these problems are much less significant now than formerly. With the integration of PDF generation with authoring and publication systems, it is no longer necessary to rely heavily on "re-scanned" or "bitmap" content in PDF; and both the format and the reader software have been developed to incorporate all the semantic transformation and adaptation capabilities recommended for best-practice accessibility for users with disabilities.
Current Situation of Practice
However, although the historical, intrinsic, deficiencies of PDF have been largely resolved, there are still significant practical issues with accessibility of PDF resources on the web:
-
Although the PDF specification, and much of the available reader software, now has good accessibility capability, the effective use of that capability is still completely at the mercy of document authors and publishers. In practice, the vast majority of PDF currently being encountered or reported to the DCU e-Access laboratory is still essentially "presentational" in style. That is, it lacks the textual alternatives for graphical elements and the effective separation between document structure and (visual) presentation that are essential to effective transformation to facilitate people with a wide variety of disabilities. This form of essentially "visual" PDF is comparable to the result of a traditional "scanning and OCR" process. It has some minimal benefits over print-only publication for people with disabilities; but falls very far short of what would now be recognised as professional best practice in accessible document publication.
-
Even in those (still comparatively rare) cases where PDF is available that does incorporate best accessibility techniques and design, access to this still practically relies on users access to compatible technology (both reader software itself, and integration of that with their individual assistive technologies) and the users' own capability in using that. As noted above, the availability of accessible reader software, and its integration with assistive technologies is continuing to develop and improve but is not yet mature. But, in any case, for some users (particularly affected by a variety of cognitive disabilities) we conjecture that the need to master an additional set of interfaces skills (over and above those used in browsing of "native", X/HTML+CSS documents) may be a significant extra burden or barrier. (Cognitive disabilities may clearly affect the ability of a user to effectively interpret a document in any format; but the point here is that unnecessary variety in user interfaces may add another additional burden to that.)
WCAG (1.0) Conformance
WCAG 1.0, checkpoint 11.1 (priority 2) states:
Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported.
For the resources that are typically published in PDF format, essentially the same functional capabilities are generally available using the standard W3C technologies of X/HTML and CSS. These technologies are mature, stable, and now widely supported. PDF, on the other hand, is not a W3C technology, per se. Thus, it is our view that publishing resources on the web in PDF format only should generally be regarded as a violation of this checkpoint; and such resources should not be included within the scope of a conformance claim to WCAG 1.0 at level Double-A (or above).
(In theory, there may be exceptional, and untypical, situations where the document content cannot be effectively rendered in X/HTML+CSS, in which case this checkpoint might not apply; but these are extremely rare in our experience. This particular conformance constraint may also change if, or when, WCAG 2.0 reaches formal recommendation status; however, at the time of writing, WCAG 2.0 is still a working draft, which formally means it can be cited only as a "work in progress".)
Quite aside from the generic deprecation of non-W3 technologies of checkpoint 11.1, the vast majority of PDF resources encountered on the web (as with the majority of X/HTML resources also) fail a variety of other applicable priority 2 checkpoints, such as:
-
Use header elements to convey document structure and use them according to specification. [...]
-
Mark up lists and list items properly. [...]
Further, many PDF resources published on the web also fail to satisfy various WCAG 1.0 checkpoints even at priority 1, such as:
-
WCAG 1.0, checkpoint 1.1 (priority 1):
Provide a text equivalent for every non-text element [...]
-
WCAG 1.0, checkpoint 5.1 (priority 1):
For data tables, identify row and column headers. [...]
Clearly, any resources published on a web site in PDF format only, which fail these or other priority 1 checkpoints, should not be included within the scope of a conformance claim to WCAG 1.0 at any level.
To be clear: it is perfectly permissible, and even desirable, to use PDF as an alternative format for resources on a web site. If the same resources are available, using W3C technologies (X/HTML+CSS etc.), where those versions satisfy all applicable checkpoints, then it is correct and appropriate to include those resources within the scope of a WCAG 1.0 conformance claim at the relevant level. Thus, the use of PDF is not, in itself, an intrinsic barrier to WCAG 1.0 conformance, and, indeed, may be a very valuable enhancement of a site for many users (specifically including users with a variety of non-print-related disabilities).
PDF "Advantage" Revisited?
As already noted, a key advantage of PDF typically reported to us by authors and publishers is precisely that it involves virtually zero additional cost over and above an existing, print-oriented, publication process.
However, it is now clear that this "advantage" relies on a hidden, implicit, and discriminatory assumption about electronic publication: namely that the only functional requirement of the electronic version is to faithfully reproduce the precise visual appearance (layout, style, etc.) of a corresponding print publication. But the unavoidable effect of this is that this form of electronic publishing actually inherits many of the same access barriers that were already present in the print publication. This is, of course an opportunity lost; and formally means that such publications (if they are the exclusive electronic version provided) definitely do not conform to WCAG 1.0 (nor, indeed to even the current draft of WCAG 2.0).
This is not (any longer) a problem intrinsic to the PDF format, or the available reader software. It is possible, in principle, for authors and publishers to produce so-called "accessible PDF" - a form of PDF in which the specific features and techniques needed to support accessibility for users with disabilities have been properly applied. But the difficulty is that this form of PDF cannot be "automatically" generated, with essentially zero extra effort, from existing print-oriented tools and processes. It requires significant, and appropriately skilled, manual effort to prepare documents in this way. This effort is still, typically, very modest, compared to the total cost of authoring or publication; and would be likely, in our judgement, to fall within typical tests of "reasonable accommodation" (required by law or policy in various contexts). However: it is still not zero.
Accordingly, if it is required to produce electronic publications that meet the generally recognised standards of "best practice" for access by users with disability (as codified in WCAG and elsewhere), then a key advantage typically perceived for PDF publication no longer applies.
Further, given the nature of the work involved in making documents accessible, it seems that PDF publication has no particular advantage, in this regard, over X/HTML+CSS publication. Indeed, both common sense and informal feedback from agencies engaged in this work suggests that the cost of producing an "accessible" electronic document, based on a print-oriented "original", is more or less the same regardless of whether one chooses PDF or X/HTML+CSS as the target format. Since X/HTML+CSS has the specific benefit of not requiring the user to install, configure, or learn how to use a separate document reader (whether implemented as a stand-alone tool or a browser "plug-in"), and would unambiguously meet the technical terms of WCAG 1.0 level Double-A (checkpoint 11.1), then there is a clear argument to prefer (accessible) X/HTML+CSS as a "primary" electronic publication format.
Of course, this is without prejudice to the provision of documents in (accessible) PDF also, as an additional, alternative, format. In fact, any publishing process properly built around generic markup (e.g., docbook, LaTeX etc.,) using the single source master concept allows multiple accessible formats to be produced from a single "accessible master document", at zero incremental cost. This is really the essential principle underlying WCAG 1.0 Guideline 3 (" Use markup and style sheets and do so properly") and should be considered as the preferred approach wherever practical.
Summary Recommendation
Based on the above considerations, our recommendation is that, in the cases of the great majority of documents currently being published on the web in PDF format, these should be accompanied by fully accessible, X/HTML+CSS versions, conveying all the same content and functionality.
This recommendation naturally has most accessibility impact when (as is currently typical) the PDF in question has not been designed as "accessible PDF"; however, we should emphasise that we still hold to this recommendation even when considering "accessible PDF" documents. That is, while we certainly encourage the provision of "accessible PDF" versions of any document, our recommendation is that this should still always be in addition to, not instead of, a primary, accessible, X/HTML+CSS version. This is based on the experience that there is little, if any, cost difference in producing either of these (or, indeed, both, using a "single source master" publishing process), from the same "non-accessible" original document; and that the "accessible" X/HTML+CSS version will still carry modest, but non-negligable, accessibility benefits over and beyond even an "accessible" PDF version.
Note also, that while we acknowledge that there will generally be some marginal additional cost in making any document "accessible" (regardless of whether this is "accessible PDF" or "accessible X/HTML+CSS" or an "accessible single source master"), we consider that this cost will usually be very small compared to overall document production costs, and should be considered as within the scope of "reasonable accommodation".
Going forward, costs will generally be minimised by incorporating accessibility techniques and design directly into all stages of the authoring/publishing process: i.e., redesigning the process from being print-oriented (with a "bolt-on" of naive "presentational PDF" generation) to being properly digital media oriented, based on a single source master in an appropriate generalised markup format. As is usual with Universal Design/Design for All practices, this is likely to have additional spin-off benefits over and above non-discriminatory access for users with disabilities.
Acknowledgements
This note has benefited from extensive discussions, over an extended period, with many colleagues involved in all aspects of producing and using accessible electronic documents in many different contexts. We particularly appreciate comments from members of the Irish Design for All e-Accessibility Network. We are grateful to RiverDocs Limited for supporting some aspects of this review. All errors remain, of course, the responsibility of the author.


