Framework
for Gathering and Distributing
eBook
Publishing and Retail Data

Revised and Published:
This document describes the International Digital Publishing Forum’s methodology for collecting and reporting data on sales of eBooks.
The purpose of this framework is to describe how the International Digital Publishing Forum (IDPF) can gather data from publishers and retailers and how to distribute that information to the IDPF membership, press, and public.
The goals of gathering data and distributing the results are:
Here are the basic steps to the statistics program:
Of importance to the publishers and retailers are the three C’s: convenience, consistency, and confidentiality.
Convenience is defined as enabling the publishers and retailers to enter their data online via a web page regardless of their time zone or day of the week. If the publisher or retailer chooses to send their data in via a paper form, they will be provided a postal address or fax number. The data they submit via a paper form will be entered by the IDPF into the web page for inclusion into the database.
Consistency is defined as using the same data points consistently for at least one year. Simply put, changes to the forms would be made annually, if necessary, but no more often than that. This will enable publishers and retailers to get used to gathering data needed for the forms. Also, the period of time to submit data would be the same period every quarter, such as the last week of the last month in the quarter. (This is subject to change as publishers and retailers may want to report data one or two weeks after they close out a quarter. A consensus would be required from the publishers and retailers on when to collect the data.) The only data points that would change would be the mini-surveys. These survey questions would be optional and the publishers and retailers could choose to ignore the questions.
Confidentially is defined as ensuring that the data submitted by the publishers and retailers cannot be tracked back to the publishers and retailers. When publishers and retailers log onto the web page, the IDPF will not capture the identity of the password holder nor the Internet Protocol address of the password holder. Thus a publisher or retailer can log on, enter their data, and no tracking information will be captured. Furthermore, once the data report has been created for a quarter, the database will be purged of all records and this will further ensure confidentiality. If publishers and retailers send or fax a hardcopy form, the IDPF will enter the data into the web page and destroy the paper form.
The goal of collecting data from publishers and retailers is to make the submission routine for the publishers and retailers and protect their intellectual property by ensuring their privacy.
For consumers, there are two pieces of information that are vital: currency and validity.
Currency is defined as how old the data is submitted by the publishers and retailers and reported to the consumers. Depending upon when the publishers and retailers agree on they can provide data, it may be possible to provide data within weeks after the close of a quarter. Conversely, it may be that data will need to lag a full quarter and thus represent results that are six months old. In any event, consumers will be made aware of the age of the data.
Validity is defined as how valid is the data based on how the data is collected and who provides the data. For all reports, the methodology will be reported and since the methodology used to collect and analyze the data should always be the same, this will amount to providing template information to the consumers. But of more importance to consumers is how many of the publishers and retailers reported data. In the arena of social science research, a key measurement is the non-response rate of any research. Simply put, if 200 publishers and retailers have agreed to provide data but only 50 publishers and retailers provide data during a given reporting period, the validity of the data would be suspect. Thus, the number of publishers and retailers reporting data each period must be provided the consumers.
Given that the ePublishing industry is somewhat different than the publishing industry because of technologies like eBooks, Print On Demand, and the Internet, the barriers to entry are lower to launch a publishing house or a retailer. Therefore, there needs to be some criteria for deciding which publishers and retailers are eligible to submit data.
Here are the requirement
guidelines but please note, eligibility
will be determined on a case-by-case basis by the IDPF; publishers and
retailers are encouraged to apply to submit data. Also, any organization which
is member of the IDPF can participate in the survey.
The following guidelines will be used:
The data points that will be collected via the survey are:
Publisher Numbers:
Retailer Numbers:
If publishers and retailers do not have actual numbers available, they will be able to denote that the number provided is an estimate. It is important that all data fields be completed as lack of data will impact the survey results.
Publishers and retailers will identify by percentage the number of eBooks published and/or sold based on the definitions provided in Appendix A: Categories for Publishing and Sales Data.
In order to meet the goals of: “Provide meaningful data, which is defined as number of eBooks published and number of eBooks sold to the IDPF membership and companies who contribute data to the ePublishing sales statistics” and “Enable publishers and retailers to communicate issues or requirements via “mini-surveys” to other companies, such as software developers, in the IDPF”, the IDPF membership and any company who contributed data to the survey will be provided with the following data in a report. Please note that NO individual company data will be revealed and all numbers will be reported as AGGREGATE (the sum of all submitted data) numbers:
To meet the goal of: “Provide an overview of the data to the public, which is defined as consumers and the press” the data will be made available to the public in the form of a press release from the IDPF website. The press release will contain a headline, summary paragraph, two sections to report data from publishers and retailers, any relevant mini-survey questions, and an explanation of the methodology used to collect the data.
Note: The IDPF will
publish actual aggregate numbers to the public. That is, all of the numbers
submitted to the IDPF will be summed and reported to the public as industry
numbers. FURTHER, the IDPF will report in alphabetical order the names of the
companies who contributed to the survey. Please note that the IDPF will provide
less detailed information to the public in order to maintain the value of the
survey and encourage companies to participate in order to get the complete
statistical report.
Consumers will be told how the data was gathered, when the data was collected, and how many companies contributed or did not contribute data to the survey.
Please see the press release
announcing the Q2 2003 results for an example of how the IDPF will handle the
data collected in the program.
Please see the Help webpage on the IDPF website for answers to frequently asked questions.