What is DTD

DTD (document type definition)

The Document Type Definition (DTD) defines the permitted elements and associated attributes for an XML document. The document type declaration corresponds roughly to a vocabulary for a very specific class of XML documents and its grammar also represents the rules for determining whether the content of a document is valid or invalid. With the help of a Document Type Definition, it is possible to check a document for its validity, in that its structure corresponds to the DTD. A valid document is always well-formed.

The syntax of DTDs is only expressive enough to decide which elements can be the content of an XML document. However, document type declarations do not allow the content of the elements themselves to be viewed in a differentiated manner. A precise differentiation of data types has only been realized with the XML schema standardized by the World Wide Web Consortium (W3C).

If no document type definition or a schema is defined for a document, it cannot be checked for validity but only for well-formedness. In addition to DTDs and the W3C XML schema, Relax NG (Regular Language for XML, New Generation) is another way of describing the structure of an XML document.

In connection with XML documents, a distinction must be made between well-formedness and validity:

  • Well-formedness refers to the syntactic correctness of an XML document.
  • An XML document is valid if its structure corresponds to the associated document type declaration or, alternatively, to the referenced schema. Valid XML documents are always well-formed.
As soon as a specific DTD has been agreed for an XML document, it is the task of an XML parser to determine whether the document complies with the defined rules or not. For this it is necessary that an XML document includes the DTD or refers to this - then external DTD. A DTD always determines from a set of markup or tag declarations. In this way it can be determined which elements and how they can be used in a document. The DTD therefore first of all defines restrictions for a well-formed XML document. The procedure of the XML processor means that the subsequent determination of the validity can be dispensed with, so that the status of well-formedness is still retained.

The element types permitted in the document with their content models and attributes can be defined in DTDs. A content model defines the permitted content of an element. The following content models are established by the XML specification:

  • EMPTY, such an element has no content, but may have attributes.
  • ANY, as long as it is well-formed XML, the element can have any content.
  • #PCDATA, the element contains only character data
  • Mixed content, elements can contain further sub-elements and character data.
  • Element content, such an element contains only sub-elements.
With regard to the use of DTDs, it is first necessary to check which data is involved. DTDs have only limited possibilities for special data formats - DTDs are much better to use in connection with text documents. This also relates to the inadequate use of DTDs in relation to database systems, since the syntax of DTDs with the possible data types is not meaningful enough in this regard. For many applications, the use of DTDs to validate documents is not possible due to the following restrictions:
  • DTDs use a specific - not XML - syntax. This means that your own tools are required.
  • Data typing is not supported.
  • There is only the data type #PCDATA.
  • Information on cardinality can only be used insufficiently.
  • Lack of compatibility with XML namespaces
  • Bad expandability options.
  • The definitions in DTDs are generally global, which contradicts object-oriented modeling.
In the following, however, two well-known applications from the world of XML that work on the basis of DTDs.

SVG Scalable Vector Graphics (SVG) is a graphic format for a language that is specified by a DTD and is used to represent two-dimensional graphics. SVG is used, for example, for the presentation of graphic content on cell phones or PDAs.

SMIL Synchronized Multimedia Integration Language (SMIL) is a language that was standardized in a first version by the W3C in 1998 and is used for the interactive representation of texts, videos or even images.