August 27, 2002
Copyright © 2002 by Open eBook Forum™
All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and dissemination of this work with changes is prohibited except with the written permission of the Open eBook Forum.
In order for electronic-book technology to achieve widespread success in the marketplace, Reading Systems must have convenient access to a large number and variety of titles. The Open eBook Publication Structure (OEBPS) is a specification for representing the content of electronic books. Specifically:
This specification combines subsets and applications of other specifications. Together, these facilitate the construction, organization, presentation, and unambiguous interchange of electronic documents:
OEBPS is based on XML because of its generality and simplicity, and because XML documents are likely to adapt well to future technologies and uses. XML also provides well-defined rules for the syntax of documents, which decreases the cost to implementers and reduces incompatibility across systems. Further, XML is extensible: it is not tied to any particular set of element types, it supports internationalization, and it encourages document markup that can represent a document's internal parts more directly, making them amenable to automated formatting and other types of computer processing.
XML well-formedness requires characteristics beyond what HTML browsers typically require, such as:
Empty elements (such as the HTML br and hr elements) are those that permit no content. The XML and formal HTML syntaxes for these are incompatible, though the XML form, with whitespace before the trailing slash, is accepted by most HTML browsers. The addition of this whitespace remains strictly conformant XML, as XML ignores whitespace within tags. Hence, this specification strongly encourages, though does not require, this conforming variation of the XML form (for example, “<br />”). This is the most portable syntax and it contributes to document longevity, even though, strictly speaking, it is not valid in HTML.
Syntactic transformation from valid HTML to well-formed XML is trivial (though semantic transformations that also add brand-new structure and information value may not be). Transformation from invalid but moderately clean HTML is also usually an easy process and easily automated: several free tools already exist for this, such as “Tidy” (see http://www.w3.org/People/Raggett/tidy/). Transformation from extremely dirty HTML to XML, however, is of unpredictable complexity.
Not all well-formed XML 1.0 documents are conformant OEBPS Documents. This specification imposes further constraints in order to improve interoperability. These constraints are the “OEBPS Common Requirements,” defined below.
This specification contains two XML DTDs – the OEBPS Package DTD (Appendix A) and the Basic OEBPS Document DTD (Appendix B). The OEBPS Package file (which, beyond well-formed, must be valid XML) provides the “framework” for a complete publication, and Reading Systems should use it to find and organize publication components. The Basic OEBPS Document DTD formally defines the XHTML 1.1 subset described in this specification.
This specification ensures that for any Basic OEBPS Document, there is a syntax form that:
This version of the specification does not require, but does allow, Reading Systems to process XML namespaces according to the XML Namespaces Recommendation at http://www.w3.org/TR/REC-xml-names.
Namespace prefixes distinguish identical names that are drawn from different XML vocabularies. An XML namespace declaration in an XML document associates a namespace prefix with a unique URI. The prefix can then be employed on element or attribute names in the document. Alternatively, a namespace declaration in an XML document may identify a URI as the default namespace, applicable to elements lacking a namespace prefix. The XML namespace prefix is separated from the suffix element or attribute name by a colon.
OEBPS Documents must not contain declarations of default namespaces that reference namespaces other than the XHTML namespace (“http://www.w3.org/1999/xhtml”). Conversely, any declarations of prefixed namespaces within OEBPS Documents must not reference the XHTML namespace.
If a Reading System is not namespace-aware, any element within an OEBPS Document that contains a namespace prefix is treated as an Extended OEBPS Document element, with the colon acting as a normal XML Name character, afforded no special meaning.
The use of the dc: prefix, however, is required for Dublin Core metadata element attributes in the OEBPS Package file. For upwards compatibility, the element dc-metadata in an OEBPS Package file is required to have an attribute of xmlns:dc="http://purl.org/dc/elements/1.1/" and xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/". In addition, the Dublin Core elements are declared in the OEBPS Package DTD with an explicit prefix of dc:.
This specification recognizes the importance of current software tools, legacy data, publication practices, and market conditions, and has therefore based the Basic OEBPS Document vocabulary on XHTML 1.1. This approach allows content providers to exploit current XHTML content, tools, and expertise.
To minimize the implementation burden on Reading System implementers (who may be working with devices that have power and display constraints), the Basic OEBPS Document element set does not include all XHTML 1.1 elements and attributes. The elements and attributes were selected from the XHML 1.1 specification and were chosen to be consistent with current directions in XHTML.
Any construct deprecated in XHTML 1.1 is either deprecated or omitted from this specification; CSS-based equivalents are provided in most such cases. Style sheet constructs are also used for new presentational functionality beyond that provided in XHTML.
To achieve predictable results, for greater document interoperability, and to support upwards compatibility with future versions of this specification, it is strongly recommended that Basic OEBPS Documents be valid XML documents with respect to the Basic OEBPS Document DTD.
This specification defines a style language based on CSS 2, with a media type of “text/x-oeb1-css”. The Publication Structure Working Group is aware that this definition of a media type goes against the recommendation of the CSS Working Group, but has chosen to do so due to practical considerations.
The CSS-based style sheet constructs in this specification define required rendering functionality. To minimize the burden on Reading System developers and device manufacturers, not all CSS 2 properties are included. A few additional properties and values have been added to support page layout, headers, and footers.
In a number of cases, this specification does not require Reading Systems to provide the full range of rendering that a standard CSS style sheet might request. For example, some Reading Systems will use monochrome displays. It would neither be acceptable to limit all Reading Systems to monochrome, nor to declare color use a non-standardized extension beyond OEBPS. In such cases, the CSS settings are allowed, and keep their meanings; but a conforming Reading System may gracefully degrade to a simpler rendering.
This specification supports the style attribute (though deprecated), the style element, and externally linked style sheets. Reading Systems need not perform XML-namespace handling while processing style sheets.
Style sheets may be associated with an OEBPS Document in several ways:
The relative priority of the first three cases is as defined for XHTML 1.1 and CSS 2. Style sheets linked via a processing instruction are treated as if they had been linked via XHTML link elements preceding any actual XHTML link elements. As defined in the Conformance section, if no style sheet is defined or no applicable style is found for a given element, XHTML rendering is the default as defined elsewhere in this specification.
Styles attached via the first two methods listed above must use only those CSS constructs defined in Section 4 of this specification. External style sheets linked via the XHTML link element or by the processing instruction xml-stylesheet, however, may use this or any other style language, such as XSL (see http://www.w3.org/TR/xsl).
Style sheets of type “text/x-oeb1-css” must employ only those CSS constructs defined as supported in Section 4 of this specification. Style sheets of other MIME media types may be substituted for the text/x-oeb1-css style sheets at the discretion of the Reading System.
The XHTML 1.1 specification groups externally linked style sheets into sets by their titles (including a “persistent” set for which the title is the null string). This specification requires that at least one style sheet in each such set must be of MIME media type “text/x-oeb1-css”.
Reading Systems that implement only the OEBPS CSS subset may ignore any style sheets using other style languages. Reading Systems that support extended style sheet functionality may choose among any of the other external style sheets. It is strongly recommended that unique MIME media types be defined for any novel style sheet languages supported, and that style sheets in those languages be detected by examining the MIME media type.
The Dublin Core is designed to minimize the cataloging burden on authors and publishers, while providing enough metadata to be useful. This specification supports the set of Dublin Core 1.1 metadata elements (http://dublincore.org/documents/1999/07/02/dces/), supplemented with a small set of additional attributes addressing areas where more specific information may be useful. For example, the role attribute added to the dc:Contributor element allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes.
Content providers must include a minimum set of a metadata elements, defined in section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.
Publications may use the entire Unicode character set, in UTF-8 or UTF-16 encodings, as defined by Unicode (see http://www.unicode.org/). The use of Unicode facilitates internationalization and multilingual documents. However, Reading Systems are not required to provide glyphs for all Unicode characters.
Reading Systems must parse all UTF-8 and UTF-16 characters properly (as required by XML). Reading Systems may decline to display some characters, but must be capable of signaling in some fashion that undisplayable characters are present. They must not display Unicode characters merely as if they were 8-bit characters. For example, the biohazard symbol (0x2623) need not be supported by including the correct glyph, but must not be parsed or displayed as if its component bytes were the two characters “&#” (0x0026 0x0023).
This specification defines a list of OEBPS Core Media Types that all Reading Systems must support (as required by this specification) and publications may include. Publications may include resources of other media types, but for each such resource must include an alternative resource of an OEBPS Core Media Type (using methods defined in this specification).
The OEBPS Core Media Types are:
|
MIME Media Type |
Reference |
Description |
|
image/jpeg |
RFC 2046 |
Used for raster graphics |
|
image/png |
RFC 2083 |
Used for raster graphics |
|
text/x-oeb1-document |
this specification |
Used for Basic or Extended OEBPS Documents |
|
text/x-oeb1-css |
this specification |
Used for OEBPS CSS-subset style sheets |
|
application/xml-dtd |
RFC 3023 |
Used for DTDs included with the publication |
|
application/xml-external-parsed-entity |
RFC 3023 |
Used for external parsed entity documents |
This specification includes support for the XML style sheet processing instruction xml-stylesheet, defined in the W3C Recommendation “Associating Style Sheets with XML Documents” (http://www.w3.org/TR/xml-stylesheet). In this specification, the allowed pseudo-attributes for xml-stylesheet are those corresponding to the allowed attributes for XHTML link when used to identify an external style sheet. This processing instruction is placed in the prolog of the XML document. It can appear multiple times as link can.
This section defines conformance for OEBPS Documents, Publications, and Reading Systems.
This specification defines two named levels of conformance for OEBPS Documents—Basic and Extended, and one conformance level for OEBPS Publications. An OEBPS Document is conforming if and only if it is either a Basic OEBPS Document or an Extended OEBPS Document.
Each conformant OEBPS Document (whether Basic or Extended) and each conformant OEBPS Package File must meet these necessary conditions, referred to in this specification as the “Common Requirements:”
A conformant OEBPS Document (whether Basic or Extended) must meet these necessary conditions, referred to in this specification as the “common document requirements:”
A document is a Basic OEBPS Document if and only if:
A document is an Extended OEBPS Document if and only if
OEBPS Documents, Basic or Extended, may or may not be valid (as defined in XML 1.0) with respect to an associated DTD. However, all OEBPS Documents must be well-formed XML 1.0 documents.
A collection of files is a conforming OEBPS Publication if and only if
This specification defines only one level of conformance for a Reading System. A Reading System is conformant if and only if it processes documents as follows:
Note: Reading Systems are not required to support XML entity and attribute declarations (beyond parsing past them as XML requires), because such constructs are not permitted in conforming OEBPS Documents.
It is the intent of the contributors to this specification that subsequent generations of this specification continue in the directions established by the 1.0 release. Specifically:
Version 1.2 of the OEBPS Publication Structure is not meant to be a substantially “new” specification. However, version 1.2 does add functional enhancements over 1.0.1, largely supporting the goal of allowing enhanced control over content presentational fidelity. Specifically, the following are the most substantive additions:
It was a goal of version 1.2 that all documents conformant according to version 1.0.1 would remain conformant under 1.2. However, removal of elements deprecated in 1.0.1 (e.g. font) and the addition of namespace requirements (see Section 1.3.3) rendered full compatibility with version 1.0.1 impossible.
Use of Extended OEBPS Documents is the recommended mechanism for adding information and structure beyond that provided by the XHTML subset defined in this specification (e.g. to associate further semantics with content). Arbitrary non-OEBPS elements may be added as long as such elements are provided with style definitions in accompanying style sheets.
For example, the following document would be an Extended OEBPS Document excerpt:
<chapter>
<milestone n="257" />
<chapterhead>Chapter one</chapterhead>
<p>Now is the time… </p>
</chapter>
if associated with a style sheet containing the following excerpt:
chapter {page-break-before: always; display: block}
milestone {display: none}
chapterhead {
font-weight: bold;
font-family: sans-serif;
text-align: center;
display: block;
margin-top: 4ex
}
This specification incorporates features that ensure content can be made accessible to, and usable by, persons with reading disabilities. Existing accessibility features developed by the World Wide Web Consortium (W3C) for XHTML 1.1 for content accessibility are incorporated into the OEBPS specification.
OEBPS Publications should be authored in accordance with the W3C Web Content Accessibility Guidelines 1.0 (http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505/) to ensure that the broadest possible set of users will have access to books delivered in this format.
In addition, recommendations from the W3C HTML 4.0 Guidelines for Mobile Access (http://www.w3.org/TR/NOTE-html40-mobile/) and the W3C Web Accessibility Initiative's proposed User Agent Guidelines (http://www.w3.org/TR/WD-WAI-USERAGENT/) should be reviewed and applied by OEBPS implementers to ensure that Reading Systems will be in conformance with accessibility requirements.
This specification is designed to take advantage of current practices while preparing for future developments. Although details of subsequent versions of this specification remain to be determined, it is the expectation of the Publication Structure Working Group that continued evolutionary development will occur. The “themes” driving the creation of version 2.0 of the OEBPS Publication Structure are: standards compliance (e.g. full namespace support), metadata modularization, enhanced support for linking and navigation, and better support for international content. Other themes deemed important for future versions include: more rigorous separation of content and presentation, greater accessibility, Reading Device-specific presentation control and/or Reading Device profiles, application-specific markup (e.g. math, chemical), Publication container file format, multiple reading orders, and support for active content (e.g. multimedia, scripting), all while maintaining alignment with relevant standards. Additionally, maintaining backward compatibility to this version of this specification is a high priority. Future directions can be tracked at http://www.openebook.org.
Metadata support for OEBPS content is currently under development in other working groups within the OEBF; the Dublin Core constructs included in the OEBPS 1.2 Package File are only intended to provide a minimal level of metadata support while the work of those groups is being completed, as well as to maintain compatibility with 1.0.1.
A publication conforming to this specification must include exactly one OEBPS Package file, which specifies the OEBPS Documents, images, and other objects that make up the OEBPS Publication and how they relate to each other.
The package file should be named using the extension “.opf”, in order to make it readily identifiable within the group of files making up the publication. Package files are of MIME media type “text/xml”. This specification does not define means for physically bundling files together to make one data transfer object (such as using zip or tar).
It is not required that the OEBPS Package DTD be physically included in every publication. If included, it should be referenced from the manifest (as described below for other files).
The major parts of the OEBPS Package file are:
An OEBPS Package must be a valid XML document conforming to the OEBPS Package DTD (Appendix A). Appendix C includes the mnemonic character entities file associated with the OEBPS Pacakge DTD. An informal outline of the package is as follows:
<?xml version="1.0"?>
<!DOCTYPE package
PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Package//EN"
"http://openebook.org/dtds/oeb-1.2/oebpkg12.dtd">
<package>
metadata
manifest
spine
guide
</package>
The following sections describe the parts of the OEBPS Package.
The package element is the root element in a package file; all other elements are nested within it.
The package must specify a value for its unique-identifier attribute. The unique-identifier attribute's value specifies which dc:Identifier element, described in section 2.2.10, provides the package's preferred, or primary, identifier. The package file's author is responsible for choosing a primary identifier that is unique to one and only one particular package (i.e., the set of files referenced from the package file's manifest).
Notwithstanding the requirement for uniqueness, Reading Systems must not fail catastrophically should they encounter two distinct packages with the same purportedly unique primary identifier.
The required metadata element is used to provide information about the publication as a whole. It contains a Dublin Core metadata record within a dc-metadata element, and supplemental metadata in an x-metadata element.
The required dc-metadata element contains specific publication-level metadata as defined by the Dublin Core Metadata Initiative (http://dublincore.org/). The descriptions below are included for convenience, and the Dublin Core's own definitions take precedence (see http://dublincore.org/documents/1999/07/02/dces/).
The optional x-metadata element, if present, must contain one or more instances of a meta element, analogous to the XHTML 1.1 meta element, but applicable to the publication as a whole. The x-metadata element allows content providers to express arbitrary metadata beyond the data described by the Dublin Core specification. Individual OEBPS Documents may include the meta element directly (as in XHTML 1.1) for document-specific metadata. This specification uses the OEBPS Package file alone as the basis for expressing publication-level Dublin Core metadata.
For example:
<metadata>
<dc-metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
…
</dc-metadata>
<x-metadata>
<meta name="price" content="USD 19.99" />
</x-metadata>
</metadata>
The XML namespace mechanism (see http://www.w3.org/TR/REC-xml-names/) is used to identify the elements used for Dublin Core metadata without conflict. Note that there is no requirement on Reading Systems to process namespaces. This syntax is used to provide for upwards-compatibility.
The dc-metadata element can contain any number of instances of any Dublin Core elements. Dublin Core element names begin with the “dc:” prefix followed by a leading uppercase letter. Dublin Core metadata elements may occur in any order; in fact, multiple instances of the same element type (multiple dc:Creator elements, for example) can be interspersed with other metadata elements without change of meaning.
For upwards-compatibility, the element dc-metadata in an OEBPS Package must have an attribute of xmlns:dc="http://purl.org/dc/elements/1.1/" and xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/".
Each Dublin Core field is represented by an element whose content is the field's value. At least one of each of dc:Title, dc:Identifier and dc:Language must be included in the dc-metadata element. Dublin Core elements, like any other elements in the OEBPS Package file, may have an id attribute specified. At least one dc:Identifier, that which is referenced from the package unique-identifier attribute, must have an id specified.
Because the Dublin Core metadata fields for Creator and Contributor do not distinguish roles of specific contributors (such as author, editor, and illustrator), this specification adds an optional role attribute for this purpose. See section 2.2.6 for a discussion of role.
To facilitate machine processing of dc:Creator and dc:Contributor fields, this specification adds the optional file-as attribute for those elements. This attribute is used to specify a normalized form of the contents. See section 2.2.6 for a discussion of file-as.
This specification also adds a scheme attribute to the dc:Identifier element to provide a structural mechanism to separate an identifier value from the system or authority that generated or defined that identifier value. See section 2.2.10 for a discussion of scheme.
This specification also adds an event attribute to the dc:Date element to enable content providers to distinguish various publication specific dates (for example, creation, publication, modification). See section 2.2.7 for a discussion of event.
For example:
<package unique-identifier="xyz">
<metadata>
<dc-metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
<dc:Title>Alice in Wonderland</dc:Title>
<dc:Language>en</dc:Language>
<dc:Identifier id="xyz"
scheme="ISBN">123456789X</dc:Identifier>
<dc:Creator role="aut">Lewis Carroll</dc:Creator>
</dc-metadata>
</metadata>
...
</package>
There are no attributes for the elements within dc-metadata defined by Dublin Core – only the elements' contents are so defined.
The following subsections describe the individual Dublin Core metadata elements.
The title of the publication. An OEBPS Package must include at least one instance of this element type, however multiple instances are permitted. Any Reading System that displays title metadata to the user should either use the first dc:Title only, or all dc:Title elements.
A primary creator or author of the publication. Additional contributors whose contributions are secondary to those listed in dc:Creator elements should be named in dc:Contributor elements.
Publications with multiple co-authors should provide multiple dc:Creator elements, each containing one author. The order of dc:Creator elements is presumed to define the order in which the creators' names should be presented by the Reading System.
This specification recommends that the content of the dc:Creator elements hold the text for a single name as it would be presented to the user.
This specification adds to the dc:Creator element two optional attributes: role and file-as. The set of values for role are identical to those defined in section 2.2.6 for the dc:Contributor element. The file-as attribute should be used to specify a normalized form of the contents, suitable for machine processing. For example, one might find
<dc:Creator file-as="King, Martin Luther Jr." role="aut">
Rev. Dr. Martin Luther King Jr.
</dc:Creator>
If a Reading System displays creator information, the Reading Systems must display the contents of all dc:Creator elements, in the order provided, with appropriate separating spacing and/or punctuation.
Multiple instances of the dc:Subject element are supported, each including an arbitrary phrase or keyword. This specification makes no attempt to standardize subject naming schemes, such as the Library of Congress Subject Heading System.
Description of the publication's content.
The publisher as defined by the Dublin Core Metadata Element Set (http://dublincore.org/documents/1999/07/02/dces/).
A party whose contribution to the publication is secondary to those named in dc:Creator elements.
Other than significance of contribution, the semantics of this element are identical to those of dc:Creator. Reading Systems are free to choose to display dc:Creator information without accompanying dc:Contributor information.
This specification adds to the dc:Contributor element two optional attributes: role and file-as. The file-as attribute is defined as for dc:Creator, and is documented in section 2.2.2.
The normative list of values used for the role attribute is defined by the MARC relator code list (http://www.loc.gov/marc/relators/). When roles are specified, the 3-character registered MARC values must be used when applicable. Although that list is extensive, other values may be added if a required role is not covered by those predefined values. Such values must begin with “oth.”, and shall be considered subdivisions of the “other” relator code. Like other constructs in this specification, these values are case-sensitive and must be coded entirely in lower-case.
For convenience, some relator code values are listed here as examples. Consult the MARC code list cited above for the complete list.
Adapter [adp] Use for a person who 1) reworks a musical composition, usually for a different medium, or 2) rewrites novels or stories for motion pictures or other audiovisual medium.
Annotator [ann] Use for a person who writes manuscript annotations on a printed item.
Arranger [arr] Use for a person who transcribes a musical composition, usually for a different medium from that of the original; in an arrangement the musical substance remains essentially unchanged.
Artist [art] Use for a person (e.g., a painter) who conceives, and perhaps also implements, an original graphic design or work of art, if specific codes (e.g., [egr], [etr]) are not desired. For book illustrators, prefer Illustrator [ill].
Associated name [asn] Use as a general relator for a name associated with or found in an item or collection, or which cannot be determined to be that of a Former owner [fmo] or other designated relator indicative of provenance.
Author [aut] Use for a person or corporate body chiefly responsible for the intellectual or artistic content of a work. This term may also be used when more than one person or body bears such responsibility.
Author in quotations or text extracts [aqt] Use for a person whose work is largely quoted or extracted in a works to which he or she did not contribute directly. Such quotations are found particularly in exhibition catalogs, collections of photographs, etc.
Author of afterword, colophon, etc. [aft] Use for a person or corporate body responsible for an afterword, postface, colophon, etc. but who is not the chief author of a work.
Author of introduction, etc. [aui] Use for a person or corporate body responsible for an introduction, preface, foreword, or other critical matter, but who is not the chief author.
Bibliographic antecedent [ant] Use for the author responsible for a work upon which the work represented by the catalog record is based. This may be appropriate for adaptations, sequels, continuations, indexes, etc.
Book producer [bkp] Use for the person or firm responsible for the production of books and other print media, if specific codes (e.g., [bkd], [egr], [tyd], [prt]) are not desired.
Collaborator [clb] Use for a person or corporate body that takes a limited part in the elaboration of a work of another author or that brings complements (e.g., appendices, notes) to the work of another author.
Commentator [cmm] Use for a person who provides interpretation, analysis, or a discussion of the subject matter on a recording, motion picture, or other audiovisual medium.
Compiler [com] Use for a person who produces a work or publication by selecting and putting together material from the works of various persons or bodies.
Designer [dsr] Use for a person or organization responsible for design if specific codes (e.g., [bkd], [tyd]) are not desired.
Editor [edt] Use for a person who prepares for publication a work not primarily his/her own, such as by elucidating text, adding introductory or other critical matter, or technically directing an editorial staff.
Illustrator [ill] Use for the person who conceives, and perhaps also implements, a design or illustration, usually to accompany a written text.
Lyricist [lyr] Use for the writer of the text of a song.
Metadata contact [mdc] Use for the person or organization primarily responsible for compiling and maintaining the original description of a metadata set (e.g., geospatial metadata set).
Musician [mus] Use for the person who performs music or contributes to the musical content of a work when it is not possible or desirable to identify the function more precisely.
Narrator [nrt] Use for the speaker who relates the particulars of an act, occurrence, or course of events.
Other [oth] Use for relator codes from other lists which have no equivalent in the MARC list or for terms which have not been assigned a code.
Photographer [pht] Use for the person or organization responsible for taking photographs, whether they are used in their original form or as reproductions.
Printer [prt] Use for the person or organization who prints texts, whether from type or plates.
Redactor [red] Use for a person who writes or develops the framework for an item without being intellectually responsible for its content.
Reviewer [rev] Use for a person or corporate body responsible for the review of book, motion picture, performance, etc.
Sponsor [spn] Use for the person or agency that issued a contract, or under whose auspices a work has been written, printed, published, etc.
Thesis advisor [ths] Use for the person under whose supervision a degree candidate develops and presents a thesis, memoir, or text of a dissertation.
Transcriber [trc] Use for a person who prepares a handwritten or typewritten copy from original material, including from dictated or orally recorded material.
Translator [trl] Use for a person who renders a text from one language into another, or from an older form of a language into the modern form.
Date of publication, in the format defined by “Date and Time Formats” at http://www.w3.org/TR/NOTE-datetime and by ISO 8601 on which it is based. In particular, dates without times are represented in the form YYYY[-MM[-DD]]: a mandatory 4-digit year, an optional 2-digit month, and if the month is given, an optional 2-digit day of month.
The dc:Date element has one optional attribute, event. The set of values for event are not defined by this specification; possible values may include: creation, publication, and modification.
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary.
The media type or dimensions of the resource. Best practice is to use a value from a controlled vocabulary (e.g. MIME media types).
A string or number used to uniquely identify the resource. An OEBPS Package must include at least one instance of this element type, however multiple instances are permitted.
At least one dc:Identifier must have an id specified, so it can be referenced from the package unique-identifier attribute described in Section 2.1.
The dc:Identifier element has an optional attribute defined by this specification: scheme. The scheme attribute names the system or authority that generated or assigned the text contained within the dc:Identifier element, for example “ISBN” or “DOI.” The values of the scheme attribute are case sensitive.
This specification does not standardize or endorse any particular publication identifier scheme. Specific use of URLs or ISBNs is not yet addressed by this specification. Identifier schemes are not currently defined by Dublin Core.
Information regarding a prior resource from which the publication was derived; see the Dublin Core Metadata Element Set (http://dublincore.org/documents/1999/07/02/dces/).
Identifies a language of the intellectual content of the Publication. An OEBPS Package must include at least one instance of this element type, however multiple instances are permitted. The content of this element must comply with RFC 3066 (see http://www.ietf.org/rfc/rfc3066.txt), or its successor on the IETF Standards Track. The Dublin Core permits other descriptions as well; this specification does not.
An identifier of an auxiliary resource and its relationship to the publication.
The extent or scope of the publication's content. Recommended best practice is to select a value from a controlled vocabulary; see the Dublin Core Metadata Element Set (http://dublincore.org/documents/1999/07/02/dces/).
A statement about rights, or a reference to one. In this specification, the copyright notice and any further rights description should appear directly.
This specification does not address the manner in which a Content Provider specifies to a secure distributor any licensing terms under which readership rights or copies of the content may be sold.
The required manifest provides a list of all the files that are parts of the publication. The manifest element must contain one or more item elements. Each item describes a document, an image file, a style sheet, or other component that is considered part of the publication.
Each item element contained within a manifest element must have the attributes id, href (a URI; if relative, the URI is interpreted as relative to the package file itself), and media-type (specifying the item's MIME media type).
The order of item elements in the manifest is not significant.
For example,
<manifest>
<item id="intro" href="introduction.html"
media-type="text/x-oeb1-document" />
<item id="c1" href="chapter-1.html"
media-type="text/x-oeb1-document" />
<item id="c2" href="chapter-2.html"
media-type="text/x-oeb1-document" />
<item id="toc" href="contents.xml"
media-type="text/x-oeb1-document" />
<item id="oview" href="arch.png"
media-type="image/png" />
</manifest>
The URIs in href attributes of item elements in the manifest must not use fragment identifiers.
This specification defines a set of OEBPS Core Media Types that all conforming Reading Systems must support (as required by this specification). For a publication that uses only those media types, the manifest merely lists the publication's component files directly. However, content providers may construct publications that reference items of additional media types. In order for such publications to be read by all conforming Reading Systems, content providers must provide alternative “fallback” items for each such item. For every item that is not an OEBPS Core Media Type, at least one of its associated fallback items must be of a type drawn from the set of OEBPS Core Media Types.
This specification defines three different mechanisms for specifying OEBPS Core Media Type fallbacks. First, for inline “replaced” resources referenced via the object element, this specification relies on that element's inherent replacement capabilities, described in section 3.3.6. Second, for non-inline destinations, whether referenced from a document or a package, and for inline “replaced” resources referenced via the img element (described in section 3.3.4), the fallback attribute of the item is used. Third, for inline “replaced” resources referenced via the img element, the text value of the alt attribute provides a valid fallback.
An item identifies a fallback item using its fallback attribute, which must specify the ID of the item element that identifies the fallback. Items referenced from fallback attributes may each specify a fallback attribute in turn, forming a longer “fallback path.” For example,
<manifest>
<item id="item1"
href="FunDoc.txt"
media-type="text/plain"
fallback="fall1" />
<item id="fall1" fallback="fall2"
href="FunDoc.html"
media-type="text/html" />
<item id="fall2"
href="FunDoc.oeb"
media-type="text/x-oeb1-document" />
<item ...>
</manifest>
If a fallback attribute points to an item that also has a fallback attribute, a Reading System must continue down the fallback path until it reaches a reference to an item of a media type it can display. A Reading System may continue further, and may display any item from the chain. In the absence of element-specific (i.e. img and object) fallback information, every item in a publication that is not of one of the OEBPS Core Media Types must, directly or indirectly, specify a fallback path to an item of one of the OEBPS Core Media Types.
Fallback paths must terminate; circular references are not permitted. Nevertheless, Reading Systems should not fail catastrophically if they encounter such a loop.
Following the manifest, there must be one spine element, which defines a primary linear reading order of the publication. It specifies an ordered list of one or more OEBPS Documents drawn from the manifest, using itemref elements contained within the spine element.
A publication must specify exactly one spine. Reading Systems must treat the file named in the first itemref element within the spine as the first file to be rendered in the reading of the book. The successive files named in its itemref elements are those that are to be rendered using “next-page”-type functionality that may be available in the Reading System.
The spine must refer only to item elements of media type text/x-oeb1-document. Content of other media types may be referenced via OEBPS Documents, which should provide text alternates and other information to enhance accessibility as appropriate.
The spine need not include references to every one of the manifest's item elements that reference OEBPS Documents, because there are means other than the spine for accessing documents in the publication. For example, hypertext links may provide access to documents not in the spine, as may tours and guides (see below).
For example,
<manifest>
<item id="toc"
href="contents.html"
media-type="text/x-oeb1-document" />
<item id="c1"
href="chap1.html"
media-type="text/x-oeb1-document" />
<item id="c2"
href="chap2.html"
media-type="text/x-oeb1-document" />
<item id="c3"
href="chap3.html"
media-type="text/x-oeb1-document" />
<item id="footnotes"
href="footnotes.html"
media-type="text/x-oeb1-document" />
<item id="f1" href="fig1.jpg" media-type="image/jpeg" />
<item id="f2" href="fig2.jpg" media-type="image/jpeg" />
<item id="f3" href="fig3.jpg" media-type="image/jpeg" />
</manifest>
<spine>
<itemref idref="toc" />
<itemref idref="c1" />
<itemref idref="c2" />
<itemref idref="c3" />
</spine>
In the above example, suppose the document referenced by ID “c1” is being viewed by a reader. When the end of that document is reached, the next document in linear order would be that referenced by ID “c2”. Document “c1” might also have hypertext links to locations in another file such as the “footnotes”. Such a file must be listed in the manifest, but need not be named by any itemref of the spine. If a reader follows the hyperlink in “c1” to “footnotes”, and the end of that file is reached, then no successor in linear order is defined by this specification.
Much as a tour-guide might assemble points of interest into a set of sightseers' tours, a content provider may assemble selected parts of a publication into a set of tours to enable convenient navigation.
An OEBPS Package may, but need not, contain one tours element, which in turn contains one or more tour elements. Each tour must have a title attribute, intended for presentation to the user. Reading Systems may use tours to provide various access sequences to parts of the publication, such as selective views for various reading purposes, reader expertise levels, etc. Because Reading Systems are not required to implement tour support, content providers should also provide other means of accessing content referenced from tours.
Each tour element contains one or more site elements, each of which must have an href attribute and a title attribute. The href attribute must refer to an OEBPS Document included in the manifest, and may include a fragment identifier as defined in section 4.1 of RFC 2396 (see http://www.ietf.org/rfc/rfc2396.txt). Each site element specifies a starting point from which the reader may explore freely. Reading Systems may use the bounds of the referenced element to determine the scope of the site. If a fragment identifier is not used, the scope is considered to be the entire document. This specification does not require Reading Systems to mark or otherwise identify the entire scope of a referenced element. The order of site elements is presumed to be significant, and should be used by Reading Systems to aid navigation.
Example:
<tours>
<tour id="tour1" title="Chicken Recipes">
<site title="Chicken Fingers"
href="appetizers.html#r3" />
<site title="Chicken a la King"
href="entrees.html#r5" />
</tour>
<tour id="tour2" title="Vegan Recipes">
<site title="Hummus" href ="appetizer.html#r6" />
<site title="Lentil Casserole" href="lentils.html" />
</tour>
</tours>
Within the package there may be one guide element, containing one or more reference elements. The guide element identifies fundamental structural components of the publication, to enable Reading Systems to provide convenient access to them.
Example:
<guide>
<reference type="toc" title="Table of Contents"
href="toc.html" />
<reference type="loi" title="List Of Illustrations"
href="toc.html#figures" />
<reference type="other.intro" title="Introduction"
href="intro.html" />
</guide>
The structural components of the books are listed in reference elements contained within the guide element. These components may refer to the table of contents, list of illustrations, foreword, bibliography, and many other standard parts of the book. Reading Systems are not required to use the guide element in any way.
Each reference must have an href attribute referring to an OEBPS Document included in the manifest, and which may include a fragment identifier as defined in section 4.1 of RFC 2396 (see http://www.ietf.org/rfc/rfc2396.txt). Reading Systems may use the bounds of the referenced element to determine the scope of the reference. If a fragment identifier is not used, the scope is considered to be the entire document. This specification does not require Reading Systems to mark or otherwise identify the entire scope of a referenced element.
The required type attribute describes the publication component referenced by the href attribute. The values for the type attributes must be selected from the list defined below when applicable. Other types may be used when none of the predefined types are applicable; their names must begin with the string “other.”. The value for the type attribute is case-sensitive.
The following list of type values is derived from the 13th edition of the Chicago Manual of Style:
cover the book cover(s), jacket information, etc.
title-page page with possibly title, author, publisher, and other metadata
toc table of contents
index back-of-book style index
glossary glossary
acknowledgements
bibliography
colophon
copyright-page
dedication
epigraph
foreword
loi list of illustrations
lot list of tables
notes
preface
OEBPS 1.0 provided document authors with a convenient “Basic” document vocabulary (a set of elements and attributes, the “tagset”) that all OEB Reading Systems must recognize. This vocabulary was selectively drawn from the HTML 4.01 tagset, essentially conforming to XHTML 1.0 Transitional. A Document Type Definition (DTD) of the Basic vocabulary (the “OEBPS 1.0 Document DTD”) was provided for optional validation purposes, to insure Basic OEBPS Documents conformed to the recommended content models and the allowed attribute values of the vocabulary.
This specification similarly continues support for a “Basic” document vocabulary which all OEBPS 1.2 Reading Systems must recognize.
The Basic OEBPS 1.2 Document vocabulary is a pure subset of XHTML 1.1 from which the elements and attributes selected for inclusion are listed in the table in Section 3.2.2. Appendix B includes the Basic OEBPS 1.2 Document DTD expressing the Basic OEBPS Document vocabulary (and is in strict conformance with the XHTML 1.1 DTD with modularization removed). Appendix C includes the mnemonic character entities file associated with the Basic OEBPS 1.2 Document DTD. Appendix D describes the differences between the Basic OEBPS 1.2 and 1.0.1 Document vocabularies.
All Basic OEBPS Documents that validate to the Basic OEBPS 1.2 Document DTD will also validate to the XHTML 1.1 DTD. It is strongly recommended that all Basic OEBPS Documents be valid XML documents with respect to the Basic OEBPS Document DTD.
Except where noted in this section and elsewhere, the semantics and expected rendering behavior of the Basic OEBPS 1.2 Document vocabulary are as defined in XHTML 1.1. XHTML 1.1 relies heavily upon HTML 4.01 for semantic definitions and expected User Agent rendering behavior (http://www.w3.org/TR/html401/).
The Basic OEBPS Document vocabulary, following XHTML 1.1, defines five Common attributes that may be applied to nearly all the elements in the Basic OEBPS Document vocabulary. These [Common] attributes consist of xml:lang and the [Core] attributes id, style, class, and title. These attributes are not individually listed in the element and attribute list in the following section 3.2.2, except to note their absence from the few exceptional elements.
These Common attributes may also be applied to non-Basic elements in Extended OEBPS Documents.
Because of their general importance, certain usage restrictions, and Reading System conformance issues, they are further described below. Except where further restricted, the data types for the attribute values conform with XHTML 1.1 (and the Basic OEBPS Document DTD in Appendix B.)
3.2.1.1 id
This attribute is used to give a unique identifier to an element. Its value must be of the XML data type ID with the token “Name” (the normative syntax of “Name” is precisely defined in section 2.3 of the XML 1.0 specification.)
Values for id must be unique across all elements in a single document. In addition, the value of id should not start with the string 'xml' (and all its case variants), since this is reserved in the XML specification for possible future standardization.
In this specification, the value of id must start with a “Letter” – it cannot start with an underscore (_) or colon (:) as otherwise allowed in XML 1.0. The character set defined by “Letter” is specified in Appendix B of the XML 1.0 specification.
For general HTML compatibility, document authors should further restrict the first character value of id to the Basic Latin letter characters (A-Za-z) and the remaining characters to (A-Za-z0-9.-_).
3.2.1.2 style (deprecated)
The core attribute style, used to apply CSS styling directly to an element, is deprecated in this specification as it is in XHTML 1.1.
It is strongly recommended the style attribute not be used in OEBPS 1.2 Documents; instead use the style element or preferably an external style sheet to specify the styling of any element.
3.2.1.3 class
This attribute allows selector-based style specifications. Its value must be a space-separated list of class names.
3.2.1.4 title
This attribute may be used to provide an “advisory title/amplification” for the element. Reading Systems may ignore its value.
3.2.1.5 xml:lang
This attribute may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. The attribute value of xml:lang must comply with RFC 3066 (see http://www.ietf.org/rfc/rfc3066.txt), or its successor on the IETF Standards Track.
This section lists all the elements and associated attributes included in the Basic OEBPS 1.2 document vocabulary. They are drawn from the XHTML 1.1 vocabulary specified at http://www.w3.org/TR/xhtml11/.
Refer to the Basic OEBPS Document DTD (Appendix B), the XHTML 1.1 specification, and section 3.3 for attribute value and other restrictions.
Table Notes:
|
Element |
Short Description |
Supported Attributes |
Document Structure Level |
May Contain (XHTML 1.1) |
|
a |
Anchor |
[Common], href, rel, rev |
Inline |
PCDATA; [Inline] (except a); [BlockOrInline] |
|
abbr |
Abbreviation |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
acronym |
Acronym |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
address |
Address |
[Common] |
Block |
PCDATA; [Inline]; [BlockOrInline] |
|
area |
Client-Side Image Map Area |
[Common], alt, coords, href, nohref, shape |
Miscellaneous |
[Empty] |
|
b |
Bold Text Style |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
base |
Document Base URI |
href |
Head |
[Empty] |
|
big |
Large Text Style |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
blockquote |
Long Quotation |
[Common], cite |
Block |
[Block]; [BlockOrInline] |
|
body |
Document Body |
[Common] |
Top |
[Block]; [BlockOrInline] |
|
br |
Forced Line Break |
[Core] |
Inline |
[Empty] |
|
caption |
Table Caption |
[Common] |
Table |
PCDATA; [Inline]; [BlockOrInline] |
|
cite |
Citation |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
code |
Computer Code Fragment |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
col |
Table Column |
[Common], align, span, valign, width |
Table |
[Empty] |
|
colgroup |
Table Column Group |
[Common], align, span, valign, width |
Table |
col |
|
dd |
Definition Description |
[Common] |
List |
PCDATA; [Block]; [Inline]; [BlockOrInline] |
|
del |
Deleted Text |
[Common], cite, datetime |
Block Or Inline |
PCDATA; [Block]; [Inline]; [BlockOrInline] |
|
dfn |
Instance Definition |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
div |
Generic Block Level Container |
[Common] |
Block |
PCDATA; [Block]; [Inline]; [BlockOrInline] |
|
dl |
Definition List |
[Common] |
Block (List) |
dd; dt |
|
dt |
Definition Term |
[Common] |
List |
PCDATA; [Inline]; [BlockOrInline] |
|
em |
Emphasis |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
h1 to h6 |
Heading |
[Common] |
Block |
PCDATA; [Inline]; [BlockOrInline] |
|
head |
Document Head |
xml:lang |
Top |
[Head]; object; script |
|
hr |
Horizontal Rule |
[Common] |
Block |
[Empty] |
|
html |
Document Root Element |
xmlns, xml:lang |
Top (Document Root) |
head, body |
|
i |
Italic Text Style |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
img |
Embedded Image |
[Common], alt, height, longdesc, src, usemap, width |
Inline |
[Empty] |
|
ins |
Inserted Text |
[Common], cite, datetime |
Block Or Inline |
PCDATA; [Block]; [Inline]; [BlockOrInline] |
|
kbd |
Text Entered by the User |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
li |
List Item |
[Common] |
List |
PCDATA; [Block]; [Inline]; [BlockOrInline] |
|
link |
Media-Independent Link |
[Common], href, media, rel, rev, type |
Head |
[Empty] |
|
map |
Client-Side Image Map |
[Common] (id is required) |
Inline |
[Block]; [BlockOrInline]; area |
|
meta |
Generic Metadata Information |
content, name, scheme, xml:lang |
Head |
[Empty] |
|
noscript |
Fallback Content For Non-Executable Script |
[Common] |
Block Or Inline |
[Block]; [BlockOrInline] |
|
object |
Generic Embedded Object |
[Common], archive, classid, codebase, codetype, data, height, type, usemap, width |
Inline |
PCDATA; [Block]; [Inline]; [BlockOrInline]; param |
|
ol |
Ordered List |
[Common] |
Block (List) |
li |
|
p |
Paragraph |
[Common] |
Block |
PCDATA; [Inline]; [BlockOrInline] |
|
param |
Named Property Value |
id, name, type, value, valuetype |
Miscellaneous |
[Empty] |
|
pre |
Preformatted Text |
[Common], xml:space |
Block |
PCDATA; script; [Inline] except big, img, object, small, sub, sup |
|
q |
Inline Quotation |
[Common], cite |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
samp |
Program, Script, and Similar Output |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
script |
Script Statements |
type, xml:space |
Block Or Inline |
PCDATA |
|
small |
Small Text Style |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
span |
Generic Inline Level Container |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
strong |
Strong Emphasis |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
style |
Style Information |
title, type, xml:lang, xml:space |
Head |
PCDATA |
|
sub |
Subscript |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
sup |
Superscript |
[Common] |
Inline |
PCDATA; [Inline]; [BlockOrInline] |
|
table |
Table |
[Common], border, cellpadding, cellspacing, summary, width |
Block (Table) |
caption; col; colgroup; tbody; thead; tfoot; tr |
|
tbody |
Table Body |
[Common], align, valign |
Table |
tr |
|
td |
Table Data Cell |
[Common], abbr, align, colspan, rowspan, valign |
Table |
PCDATA; [Block]; [Inline]; [BlockOrInline] |
|
tfoot |
Table Footer |
[Common], align, valign |
Table |
tr |
|
th |
Table Header Cell |
[Common], abbr, align, colspan, rowspan, valign |
Table |