Improving Access to Online Information via Valid HTML Mark Up
Presented at the conference The Good, The Bad and The Irrelevant, Helsinki, 3--5 September 2003.
|
Carmen Marincu |
Dr. Barry McMullin |
Contents
- Abstract
- Introduction
- Methodology
- Key Results
- Representative HTML Defects
- Future Research
- Conclusion
- Bibliography
- Appendix A: Case-variations of Formal Public Identifiers
Abstract
Information and Communication Technologies are playing a progressively more important part in our day to day life. Perhaps the most remarkable innovation in ICT is the development of the Internet, through its power of making information universally available. The Internet brings together different genders, different generations, and different cultures, disregarding boundaries such location or time. The Internet is probably the "application" with the most diverse group of users in the history of computing.
Among these diverse users, those with disabilities have particular opportunities to benefit. Using the Internet in conjunction with dedicated assistive technologies, tasks that were very difficult if not impossible to achieve for people with various types of disability can now be made fully accessible – at least, in principle. However, in practice, many online resources and services are still poorly accessible to those with disability due to unsatisfactory web content design.
One important aspect of achieving web content accessibility is compliance with technical inter-operability standards. A common practice in current web development is designing and testing for compatibility with a small number of “popular” browsers in “standard” configurations – rather than designing for compliance with generic technical standards for inter-operability. But, almost by definition, many users with disability need to use special purpose technologies – either “minority” browsers, or mainsteam browsers with unusual configurations or coupled to specialised assistive devices. Such users are, accordingly, especially reliant on compliance with generic inter-operability standards.
This paper presents results of a survey of HTML standards compliance for two samples of web sites, one drawn from Ireland, the other from the UK. It analyses the most common HTML defects encountered, and considers their potential impact on web accessibility. It also gives some recommendations to improve HTML compliance.
A particular conclusion of the study is that the general level of HTML standards compliance in both the Irish and UK samples is very poor; and that the pattern of failure is strikingly consistent in the two samples. Although considerable efforts are being made to promote web accessibility for users with disabilities in both jurisdictions, this is certainly not yet manifesting itself in improving HTML validity.
1. Introduction
The significant benefits brought in our society by the Internet are well known. It reduces barriers of distance and time, and creates a society in which – in principle – anyone can have access to products and services all over the world at any time.
The people who could, arguably, benefit most dramatically from this are those who, because of some disability, have restricted access to information and services in the physical world. Using dedicated assistive technologies, they can have access to the online version of the desired services. For example, a blind user can "read" the online version of the daily edition of her favourite newspaper or her bank statement, a user with restricted mobility can visit virtual stores from the comfort of his home, a student with cognitive disability can take her own time in understanding the taught material. [5]
An assistive technology is any device or tool (hardware or software) which adapts a conventional system for use by a person with disability. Most adaptive technologies used in browsing the Web do not operate on their own, but they act as an interface between a disabled person and the “mainstream” software/hardware, employed by a user with no disability to perform the same operation or action.
Depending on the specific disability the Internet user has, the assistive technologies will differ. Since the preponderant content on the web is textual or graphical, it can be said that persons with hearing impairments are the category least disadvantaged on the web; whilst those with blindness or other serious visual impairment are those potentially most affected. Between these extremes will fall those with a variety of other disability, such as motor impairment and learning and cognitive disabilities.
Currently, there is little specific adaptive technology for deaf users. Persons with mobility impairments usually have trouble using the hardware rather than understanding the information provided online. Depending on the severity of the disability, the solutions can start from slow keys and onscreen keyboards (where the user selects the needed character on a onscreen representation of a keyboard) to more advanced switches and scanning software (using a switch activated by the move of a body part - head nudge for example - as a yes/no selector, or word prediction that can save the effort of "typing in" the characters). Similarly, persons with vision impairments might use a variety of assistive technologies. A person with low vision, for example, might use screen magnification software which turns the mouse pointer into a magnification glass; or a colour-blind person might use software to override the the server-suggested colours. When it comes to blind people the assistive technology is more complex since it has to render the information, provided for a visual medium, in an alternative medium. Assistive technologies for blind users include braille displays and screen-readers - which are dedicated software programs that read aloud onscreen text, menus, icons and the like.
The typical practice of many web content developers is to test web site functionality only against a small number of “popular” web browser platforms in “normal” configurations. But, although the content might seem to be rendered correctly on such a platform, this does not guarantee that it is designed correctly. Most of the mainstream browsers are designed to “guess” and heuristically “repair” technical HTML defects. Thus, if the content is not rendered as expected by the author, new “adaptations” of the content, to the specific and non-standard “quirks” of a particular browser, are performed until the author is satisfied with the (purely visual) appearance of the web site. This can create a dangerous circle that not only leads to increased downloading time of the desired web site content, but dramatically decreases the chances of the same content being rendered satisfactorily on a different browser or platform. This particularly affects users with disabilities since, by definition, users of specialised assistive technologies are not using the “popular” platforms – or at least, not using them in the “normal” configurations; rather, they must depend on equipment tailored to their particular needs. As a result, it frequently happens that web sites are poorly accessible, or completely inaccessible, to such users.
This situation would be quite different if web sites were designed keeping in mind “write once, read everywhere”, which can be achieved by designing web content to meet appropriate guidelines and technical standards for interoperability. Provided both the server side and the client side conform to such guidelines and standards, the client platform can be tailored to individual user needs, and still interoperate effectively with all conforming servers.
An important source for accessible web design resources is the W3C's Web Accessibility Initiative. WAI published the Web Content Accessibility Guidelines (WCAG 1.0) in May 1999 [6]. This is now a reference point in achieving web accessibility in many of the E.U.'s Member States [3,4]. In Ireland, The National Disability Authority (NDA) have adopted WCAG 1.0 into the national “Guidelines for Web Accessibility”. In the UK, WCAG 1.0 is a source for the “Guidelines for UK Governmental web sites”, published by the Cabinet Office in May 2002.
A major step in achieving web accessibility is to use
mark-up for its intended purpose when designing web
content, and, in particular, conforming to relevant
technical standards. WCAG 1.0 Checkpoint 3.2 expresses
this thus: Create documents that validate to published
formal grammars.
This checkpoint is assigned priority level 2 which is defined as follows:
Web content developer should satisfy this checkpoint. Otherwise, one or more groups will find it difficult to access information in the document. Satisfying this checkpoint will remove significant barriers to accessing Web documents.
The WCAG guidelines are under ongoing review. The most recent public working draft of version 2.0 was published on 24th June 2003 [1]. In this version, the need for inter-operability is expressed in Guideline 4:
Use Web technologies that maximize the ability of the content to work with current and future accessibility technologies and user agents.
In this draft WCAG 2.0 the checkpoints are classified
as core
and extended
. Each checkpoint also
specifies Required Success Criteria
and Best
Practice items
. To claim even the minimum level of
conformance to WCAG 2.0, [a web] resource must satisfy
all required success criteria for all
Core checkpoints.
4.1 [CORE] Technologies are used according to specification. [1]
The Required Success Criteria
for this
checkpoint are:
-
for markup, except where the site has documented that
a specification was violated for backward
compatibility, the markup has:
- passed validity tests of the language (whether it be conforming to a schema, Document Type Definition (DTD), or other tests described in the specification)
- structural elements and attributes are used as defined in the specification
- accessibility features are used
- deprecated features are avoided
- for Application Programming Interfaces (API's), programming standards for the language are followed.
- accessibility features and API's are used when available.
Thus, in this draft WCAG 2.0 the requirement for valid
mark-up is not only re-iterated, but increased in
effective priority (from should
to
must
).
Prompted by these provisions of WCAG 1.0 and the draft WCAG 2.0, a study specifically regarding HTML validity of a sample of Irish and UK web sites was conducted in May 2003. This paper presents the techniques used, the key results, and an analysis of the most common HTML mark-up defects encountered.
2. Methodology
In this section the web sampling methodology and the HTML compliance evaluation methodology will be discussed. The information generated during these processes was recorded in a PostgreSQL database for further analysis.
2.1 Web Sampling
Selecting a representative sample of web sites corresponding to a certain country is not a simple process. For the purpose of this survey it was considered that an open directory would provide a good basis. When implementing the technologies used in the surveying process, the goal was to use as many automated tools as possible. The Open Directory Project (ODP) offers its content for download, structured as RDF (although it is not guaranteed to be well formated RDF) which made it particularly suitable source for our purposes.
The information provided by ODP is structured in a hierarchical tree of categories (similar to the Google Web Directory; indeed, the latter is derived from ODP data). Each web site is assigned to a specific category, representing the subject of the web site as closely as possible. The web sites considered for the Irish sample were taken from the category “Ireland” and its sub-categories (online version: "http://www.dmoz.org/Regional/Europe/Ireland") and the web sites considered for the UK sample were taken from the category “United Kingdom” and its subcategories (online version: "http://www.dmoz.org/Regional/Europe/United_Kingdom"). At the time when the samples were extracted (13th February 2003), the Ireland category and its sub-categories contained 5,440 web sites and the UK category and its sub-categories contained 114,044 web sites.
Considering the significant difference in the number of web sites between the two categories, it was considered that the best approach when deciding the number of web sites in each sample would be to use a fixed fraction or percentage of the category total. On the basis of the communication, processing, and storage resources available for the study, this was set at 5%, which equates to 272 sites in the Irish sample and 5,702 sites in the UK sample. These samples were then selected primarily from the following sub-categories:
- Arts and Entertainment
- Business and Economy
- Education
- Government
- Health
- News and Media
- Recreation and Sports
- Science and Environment
- Society and Culture
- Transportation
Due to the fact that web sites vary significantly in size and the type of media in which resources are offered, each web site's content was also subject to sampling, on the following basis:
- Only HTML resources were captured (due to the specific nature of the study).
- The maximum link depth of the pages to be retrieved was set to 3. This assumes that the most significant pages should be reasonably closely linked to the main (“home”) page.
- The maximum amount of data captured from a single web site was set to 225 KB. It is considered that defects found within such a sample are likely to be repeated over the web site's content.
The mirroring process (sampling of each web site's content) used the web content mirroring robot pavuk. 3,319 individual web pages (totaling 29MB) were retrieved for the Irish sample and 67,598 pages (totaling 552 MB) were retrieved for the UK sample. It was considered that, in order for the survey 's results to be reasonably representative for a web site, each site sample should have at least 3 web pages and at least 100kB of data. There were 9 Irish web sites and 258 UK web sites for which no data was captured, possibly due to network disruptions or other server side failure. For another 91 Irish web sites and 1,669 UK web sites only the home page was captured showing a failure in following any link from the home page.
Thus, in the end, the validity analysis was performed on 123 Irish sites (45% of the original sample) totaling 2,288 pages, and 2,380 UK sites (41% of the original sample) totaling 45,857 web pages.
2.2 The HTML Validity Analysis
An HTML page is properly built when its mark-up conforms to a standard technical specification. Each standard is specified by a Document Type Declaration (DTD) document which contains descriptions of the entities, elements and attributes that can be part of an HTML document, and how they can be interrelated. Because most of the existing web browsers are able to process web pages which don't conform to a DTD, many of the failures in the HTML code can pass unnoticed by most users. But such code defects can be a real impediment in access by users with disability helped by special purpose Web browsers and dedicated assistive technologies. They also complicate, and therefore inhibit, ongoing development of such niche technologies.
The technologies used in rendering web content for people with disabilities are designed to recognise mark-up elements, interpret their functionality and deliver the web content in a form that will keep the structure and the functionality of the web content as intended by the web developer. In order for this to be achieved a properly built HTML mark-up is crucial.
There are different tools that can be used in order to validate HTML code against its description in the corresponding DTD. The output is usually a list of problems encountered (diagnostics) with suggestions as to how could they be fixed. A list of such tools can be seen on the WAI's web page at http://www.w3.org/WAI/ER/existingtools.html.
The tool chosen for the HTML compliance tests in this study is onsgmls. onsgmls is a parser and validator of SGML files (HTML and XHTML) and it is part of the OpenSP collection of SGML/XML processing tools.
onsgmls implements 438 individual diagnostics which can be triggered when an element or attribute in the HTML content is not used according to its specification in the HTML page's DTD.
Each diagnostic is assigned a “severity level” as follows:
- error (318 distinct diagnostics): triggered when there is a clear non conformance between the HTML content and its specification in the DTD.
- warning (94 distinct diagnostics): triggered when the validator encounters the use of an HTML entity, element or attribute, which is technically valid but unlikely to be intended in normal mark-up.
- quantity error (26 distinct diagnostics): onsgmls has a set of variables that determine numerical limits of some document characteristics. If, during the validation process, a relevant characteristic is not in accordance with such a limit, a quantity error diagnostic is raised. onsgmls has these limits set to default values which are considered to be reasonable for HTML documents. If, during the validation process, these limits are infringed, it can be reasonably assumed that at least some user agents would have difficulty rendering the content.
The validation process implemented by onsgmls involves comparing the use of each component in an HTML document with its specification in the DTD of the HTML standard used in the document. Thus, in order for onsgmls to generate consistent validation results, it needs a properly specified DTD. For example, the W3C HTML 4.01 specification states that “a valid HTML document declares what version of HTML is used in the document”. This is normally done via a DOCTYPE declaration in the beginning of the HTML page.
Accordingly, if the DOCTYPE declaration is missing (or unrecognised), the web page will immediately and automatically fail the HTML validation tests. In such a case it would be possible to configure onsgmls to assume a default DTD against which the document should validated. However, detailed results generated by a validation of such HTML pages are not considered relevant to our study (since the document might validate against some standard DTD - had the correct one been specified).
Considering this, the final analysis and report are generally based only on the diagnostics that were triggered in web pages that contained a correct DTD in their HTML content. Missing or incorrectly specified DTD information is an HTML defect which will be considered separately from the final results.
The mapping between the HTML standard used in a document to be tested and the document type declaration corresponding to that standard is made through formal public identifiers (FPI), components of the DOCTYPE declaration.
For an SGML processor (such as onsgmls) the formal public identifiers and the DTD they are mapped to are usually specified in a “catalog” file. Appropriate catalog files are generally made public with each HTML standard published. In the configuration used in this study, onsgmls uses case-sensitive matching of FPIs when determining the DTD against which a document should be validated. However, it was found in practice that a significant number of declarations failed to match appropriate DTDs precisely because the FPI differed only in case from one in a known catalog. It was decided to enable validation in such cases, by manually adding additional catalog entries for such case-variant FPIs, mapping them onto the appropriate standard DTDs. The FPI case-variations permitted in this way are listed in Appendix A.
In general, when analyzing the results, it was considered that if one diagnostic is triggered in at least one web page of a web site sample, the web site should be counted in statistics regarding that diagnostic. However, this rule was not applied to the "no DOCTYPE declaration" diagnostic since 98.4% of the Irish sites and 99.0% of the UK sites had at least one page with missing or unrecognised document type information. Instead, the web pages that triggered this diagnostic were eliminated from the overall sample. Then each site of the resulting sample (having now only the web pages with usable DTD information) was again tested for compliance with the minimum amount of data (100kB) and the minimum number of pages (3) required, per web site, as previously outlined.
3. Key Results
Of the web sites studied, only one Irish site and one UK site had completely valid HTML mark-up.
Of the 2288 web pages in the Irish sample, and the 45,857 pages in the UK sample, analysed for HTML validity:
- 76% of the Irish web pages and 75% of the UK web pages had no DTD information.
- 1.8% of the Irish web pages and 2.1% of the UK web pages had DTD information that was not usable (typically, the PUBLIC identifier was unrecognised, even when case insensitive comparison was applied).
Of the 45 Irish web sites (of 123 web sites considered for tests) and 843 UK web sites (of 2380 web sites considered for tests) having at least 3 pages with usable DTD information:
- 22% of Irish sites and 30% of UK sites triggered up to 5 distinct error diagnostics
- 40% of Irish sites and 38% of UK sites triggered between 6 and 10 distinct error diagnostics
- 31% of Irish sites and 27% of UK sites triggered between 11 and 15 distinct error diagnostics
- 2.2% of Irish sites and 5.0% of UK sites triggered between 16 and 20 distinct error diagnostics
- 2.2% of Irish sites and 0.1% of UK sites triggered more than 20 distinct error diagnostics
As it can be seen, although the Irish and UK samples differed significantly in size, the overall pattern of results is remarkably similar.
4. Representative HTML Defects
By far, the most common HTML mark-up defect triggered in the validation tests was the absence or the misconstruction of the document type information [98.4% of the Irish sites and 99.0% of the UK sites]. A correctly specified document type declaration is obviously crucial in the validation process of a web page. But more importantly than this, when the document type information is correctly specified in an HTML document, the web browser knows how the document is constructed and its content and functionality can therefore be rendered consistently and as intended. An HTML document without a usable DTD is a challenge to web browsers to behave consistently or reliably because the mark-up structure is unpredictable.
More than that, starting with Internet Explorer 5.0 on
Macintosh (released in March 2000) there is now a trend
for the mainstream browsers to more strictly implement
standards-compliant behaviour. This means that the way
that the web content is rendered by the browser depends on
the precise document type which is declared. The document
type has to be specified using a correctly
structured DOCTYPE declaration in each web page to be
rendered. For backwards compatibility a feature called
doctype switching
is sometimes implemented.
Depending on a correctly structured document type
declaration in the web page, the content may be rendered
either according to the HTML standard specified –
standards mode
– or in a backwards
compatibility way also known as quirks mode
. But
the behavior in quirks mode
differs, in general,
from browser to browser, version to version, and from
platform to platform – that’s what
quirks
means! So it can't be predicted what kind of
behavior should be expected when the browser and the
operating system is not known. More than that, the
doctype switch
feature is not guaranteed to be kept
for long and it can be predicted that future browser
implementations will drop it in favour of the
standardized mode
only rendering behavior.
As mentioned before, in the absence of a usable document type declaration, the validation results are not consistent, and the web pages that triggered this diagnostic were removed from further, more detailed analysis. In the end, the most common remaining HTML mark-up defects were considered based only on the results of the validation of 454 web pages over 45 Irish web sites and 8598 web pages over 843 UK web sites, i.e., only those pages and sites with usable DOCTYPE information.
The HTML compliance tests triggered 35 distinct diagnostics on the sites in the Irish sample and 48 distinct diagnostics on the web in the UK sample. Although the number of distinct diagnostics triggered by the UK sample is larger than that of the Irish one, the most common 10 diagnostics are the same for both samples, and their ranking by relative frequency varies with an average of just 2 places. The 10 most common diagnostics and the percentage of the web sites in which the diagnostic is triggered at least once is shown in Fig. 1
Fig. 1 The 10 most common HTML diagnostics
The most common 5 distinct diagnostics, triggered at least once in the validation process of sites in both samples, were:
-
Undefined attribute
[86.7% of the Irish sites and 92.3% of the UK sites]In HTML each element can be described by specific attributes specified in the element's DTD declaration. The
undefined attribute
diagnostics is triggered when, during the validation, an element appears as being described by an attribute that is not specified in the element's DTD declaration. Some situations in which this diagnostic can be triggered (beside the case in which the attribute is completely undefined/unknown) are:- the attribute is defined in the HTML standard used, but it is not associated with the element that is using it in the tested HTML document
- the attribute was introduced as describing the associated element in a later HTML standard than the one declared as used in the tested HTML document
- the attribute describes elements that are not defined in the HTML standard used in the tested HTML document
Unrecognised attributes should be simply ignored by a browser; but this may then mean that important information is not rendered for the user which becomes a real impediment especially in the case of attributes implementing accessibility features.
-
Element not allowed by the document type
[73.3% of the Irish sites and 72.4% of the UK sites]An element is used in the HTML document within another element, but this should not contain it according to the DTD. These defects are usually due to a misconstruction of nested elements, for example a list item element (LI) used directly within a paragraph element (P) when it should only be used directly within a list element (OL or UL). Again, this error may or may not impact users, depending on the detailed browser behavior; but the interpretation will generally be unpredictable.
-
End tag for element not opened
[62.2% of the Irish sites and 66.3% of the UK sites]The content of an HTML element is delimited by a
start tag
and anend tag
and the way that the elements can nest is described in the DTD. If onsgmls encounters non-properly nested elements, it considers that the outer-element is implicitly closed before the start tag of the inner-element and it triggers amissing end tag
diagnostic if the end tag is required in the structure of the outer-element. Later on in the content of the document, when the original end tag of the outer-element is encountered, onsgmls considers it as belonging to no opened element.Non-properly nested elements are typically invisible to the web content developer because the most popular web browsers attempt to “repair” bad HTML mark-up. But this is clearly heuristic, and cannot guarantee that the web content will be rendered properly by all web browsers, especially when assistive technology is used. As mentioned before, the proper functionality of assistive technologies is constrained by properly structured HTML mark-up, in accordance with its DTD.
-
Required attribute not specified
[60.0% of the Irish sites and 76.0% of the UK sites]This signals that the DTD declares that a certain attribute is required on some element, but it is not present. Many specific features addressing web accessibility for users with disability were introduced in the HTML 4 standard. Most of these features are implemented with the help of compulsory attributes, very important to assistive technology since they provide details on the content of an element when the element can't be rendered as it is intended. Thus, this defect is likely to be directly correlated with accessibility problems.
An example is the ALT attribute required in the IMG element. The IMG element is used in the HTML mark-up to render an image. When non-graphical browsers are used, the image represented by the IMG element can't be rendered. The ALT attribute is intended to provide
alternative textual content
to be rendered for such users. Thus, if the attribute is not present, such users are denied access to potentially important information. -
Missing required end tag
[51.1% of the Irish sites and 48.9% of the UK sites]This diagnostic is triggered when the element DTD declaration specifies that the end tag is required but it is missing from the HTML code in the web page content. Some situations in which this diagnostic can be triggered are:
- Both the start tag and the end tag are required for XHTML (as opposed to HTML) elements. For example an “empty” element like HR will be considered correctly used in HTML in the form <HR>, whereas the same element in XHTML should be represented in the form <HR />.
- Another example would be elements that have content but do not require a end tag in HTML – for example, the P element. In HTML the form "<P> This is a paragraph that starts and ends here." is allowed, whereas in XHTML the correct form would be "<p>This is a paragraph that starts and ends here.</p>"
- In the case of non-properly
nested elements, onsgmls considers that the outer-element
is implicitly closed before the start tag of the
inner-element and it triggers a
Missing end tag
diagnostic if the end tag is required in the structure of the outer-element. Later on in the content of the document, if the original end tag of the outer-element is encountered, onsgmls considers it as belonging to no opened element and triggers anEnd tag for element not opened
diagnostic.
5. Future Research
The HTML survey presented in the current paper is part of the “Web Accessibility Reporting Project” (WARP) carried out in the eAccessibility lab at RINCE, Dublin City University. The overall project encompasses both HTML validity testing and evaluation against the wider WCAG guidelines.
The W3C WCAG guidelines are referenced in the EU Information Society policies for web accessibility so they should be considered in one way or another when web accessibility policies are developed in all EU Member States. In order to see how these policies are implemented, similar studies will be conducted in the future on web samples from other EU Member States, beyond the UK and Ireland.
6. Conclusion
Although the number of sites in each sample was very different (the number of UK sites being 20 times larger than the number of Irish sites), the results of the surveys were remarkably similar. This level of similarity could be explained by the fact that is it very likely that the web content authors would use the same range of existing web authoring tools in order to generate content. Unfortunately, determining the web authoring tool used in order to generate a specific web page is not that straight forward, so the conjecture that the similar results are due mostly to the usage of similar web authoring tools cannot be directly tested.
The results of the survey showed a disappointingly low level of validity, especially since there is effort invested in promoting web accessibility in both countries sampled here.
Most of the HTML defects on the sites studied are
probably not apparent to the majority of web users –
because the developers have specifically tested them
against some (small) selection of popular
browser
platforms. However, because they do not conform to
technical standards for interoperability, their rendering
is – at best – unpredictable. This is likely
to have a disproportionate affect on users who rely on
specialised, tailored, client technologies –
specifically, users with disabilities. Content may thus
fail to be rendered, may be garbled, or may be otherwise
inaccessible to such users. Worse, precious development
effort in individualising assistive technologies may have
to be spent on attempting to compensate for these server
side defects, rather than improving the client side
functionality that the user really needs. In the worst
case, this effort may have to be wasted repeatedly for
each different client accessing each different
(non-compliant) server. Whereas, conformance to technical
interoperability would substantially reduce or eliminate
this waste.
Web content authors should therefore become familiar with the HTML standard technical specifications and use one of the existing HTML validation tools in order to ensure that web content is valid before making it public online. These tools not only list the defects encountered but also give references as to how these defects might be repaired. One such tool is the W3C's online HTML Validator which is easy to use, doesn't require any local installations and can validate either online or uploaded web content.
As seen from the analysis of the results, most of the defects can be repaired relatively easily. Although it might seem that it would involve a considerable amount of work, many of the detected defects are inter-related – so that correcting one substantive HTML code defect could eliminate a number of reported diagnostics.
Of course, may web page authors use HTML authoring tools or content management systems, VLE/MLE etc. In such cases the HTML mark-up may be hidden from the author. If the HTML mark-up is found to be invalid, the authors can get easily discouraged from providing valid HTML code: they may have no knowledge or understanding of the HTML specification, hence they don't have the knowledge to repair the HTML code (even if the authoring tool allowed such intervention – which it commonly does not!). In this case it is strongly recommended that the authors raise the accessibility issue with the web content authoring tool developer or vendor, maybe even providing the validation results.
This survey was conducted with the purpose of
emphasising the importance of valid HTML mark-up in
universal access to online information. The results do not
represent the exact
level of web accessibility on
the Irish of UK sites, but they demonstrate a widespread
lack of concern with technical interoperability. While it
may be argued that the results are still generated based
on sample of sites, the fact that samples with
such different number of sites generated essentially the
same results is suggestive that this situation (level of
compliance with technical standards) is probably typical
of the web as a whole in these countries. This is
disappointing because it decreases considerably the
potential that the web could offer for significant
improvement in service and opportunity for users
with disabilities.
Finally, although valid HTML mark-up is an important step in having an accessible web there are still many other things to be considered. Web publishers should thus certainly not settle for simply having valid HTML content. The WAI's WCAG 1.0 presents a list of other guidelines which, if accomplished, can lead to genuinely universal access to information.
Bibliography
- B. Caldwell, W. Chisholm, J. White, G. Vanderheiden, Web Content Accessibility Guidelines 2.0 (WCAG 2.0) W3C Working Draft, World Wide Web Consortium (W3C), http://www.w3c.org/TR/WCAG20, 24 June 2003, accessed 10-07-2003
- D. Raggett, A. Le Hors, I. Jacobs, HTML 4.01 Specification, World Wide Web Consortium (W3C), http://www.w3.org/TR/html401, 24 December 1999, accessed 11-02-2003
- Information Society, eAccessibility: EEurope Targets 2001/2002, European Commission, http://europa.eu.int/information_society/eeurope/action_plan/eaccess/member_states/targets_2001_2002/index_en.htm, accessed 6-02-2003
- Information Society, eAccessibility: Participation for all in the knowledge based economy, European Commission, http://europa.eu.int/information_society/eeurope/action_plan/eaccess, accessed 6-02-2003
- J. Brewer, How People with Disabilities Use the Web, World Wide Web Consortium (W3C), http://www.w3.org/WAI/EO/Drafts/PWD-Use-Web, 4 January 2001, accessed 11-02-2003
- W. Chisholm, G. Vanderheiden, I. Jacobs, Web Content Accessibility Guidelines 1.0 (WCAG 1.0), World Wide Web Consortium (W3C), http://www.w3c.org/TR/WCAG10, 5 May 1999, accessed 11-02-2003
- J. Clark, Building Accessible Website, NewRiders Publishing, October 2002, ISBN : 0-7357-1150-X
Appendix A: Case-variations of Formal Public Identifiers
-
HTML 4.01 Strict
- "-//W3C//DTD HTML 4.01//EN" (typical usage)
- "-//w3c//dtd html 4.01//en"
-
HTML 4.01 Transitional
- "-//W3C//DTD HTML 4.01 Transitional//EN" (typical usage)
- "-//w3c//dtd html 4.01 transitional//en"
- "-//W3C//DTD HTML 4.01 transitional//EN"
-
HTML 4.01 Frameset
- "-//W3C//DTD HTML 4.01 Frameset//EN" (typical usage)
- "-//w3c//dtd html 4.01 frameset//en"
- "-//W3C//DTD HTML 4.01 frameset//EN"
-
HTML 4.0 Strict
- "-//W3C//DTD HTML 4.0//EN" (typical usage)
- "-//w3c//dtd html 4.0//en"
-
HTML 4.0 Transitional
- "-//W3C//DTD HTML 4.0 Transitional//EN" (typical usage)
- "-//w3c//dtd html 4.0 transitional//en"
- "-//W3C//DTD HTML 4.0 transitional//EN"
-
HTML 4.0 Frameset
- "-//W3C//DTD HTML 4.0 Frameset//EN" (typical usage)
- "-//w3c//dtd html 4.0 frameset//en"
- "-//W3C//DTD HTML 4.0 frameset//EN"
-
HTML 3.2
- "-//W3C//DTD HTML 3.2 Final//EN" (typical usage)
- "-//W3C//DTD HTML 3.2 Draft//EN" (typical usage)
- "-//W3C//DTD HTML 3.2//EN" (typical usage)
- "-//W3C//DTD HTML 3.2 final//EN"
- "-//w3c//dtd html 3.2 final//en"
- "-//W3C//DTD HTML 3.2 draft//EN"
- "-//w3c//dtd html 3.2 draft//en"
- "-//w3c//dtd html 3.2//en"
-
HTML (2.0)
- "-//IETF//DTD HTML//EN" (typical usage)
- "-//IETF//DTD HTML 2.0//EN" (typical usage)
- "-//IETF//DTD HTML Level 2//EN" (typical usage)
- "-//IETF//DTD HTML 2.0 Level 2//EN" (typical usage)
- "-//IETF//DTD HTML Level 1//EN" (typical usage)
- "-//IETF//DTD HTML 2.0 Level 1//EN" (typical usage)
- "-//IETF//DTD HTML Strict//EN" (typical usage)
- "-//IETF//DTD HTML 2.0 Strict//EN" (typical usage)
- "-//IETF//DTD HTML Strict Level 2//EN" (typical usage)
- "-//IETF//DTD HTML 2.0 Strict Level 2//EN" (typical usage)
- "-//IETF//DTD HTML Strict Level 1//EN" (typical usage)
- "-//IETF//DTD HTML 2.0 Strict Level 1//EN" (typical usage)
- "-//ietf//dtd html//en"
- "-//ietf//dtd html 2.0//en"
- "-//ietf//dtd html level 2//en"
- "-//ietf//dtd html 2.0 level 2//en"
- "-//ietf//dtd html level 1//en"
- "-//ietf//dtd html 2.0 level 1//en"
- "-//ietf//dtd html strict//en"
- "-//ietf//dtd html 2.0 strict//en"
- "-//ietf//dtd html strict level 2//en"
- "-//ietf//dtd html 2.0 strict level 2//en"
- "-//ietf//dtd html strict level 1//en"
- "-//ietf//dtd html 2.0 strict level 1//en"
-
XHTML
- "-//W3C//DTD XHTML 1.0 Strict//EN" (typical usage)
- "-//W3C//DTD XHTML 1.0 Transitional//EN" (typical usage)
- "-//W3C//DTD XHTML 1.0 Frameset//EN" (typical usage)
- "-//W3C//DTD XHTML 1.1//EN" (typical usage)



