This article provides a brief guide to ebook conversion, covering what it involves and how to select a conversion service, with an emphasis on maintaining quality.

For discussion of when ebook conversion – as distinct from ebook production – makes sense, see the related article ebook conversion or ebook production? A combined version of these two articles was published in IBPA Independent July 2011.

What is ebook conversion?

Ebook conversion is the process of taking a book that is either in analog (print) format or in a digital format that is incompatible with ebook readers (apps, devices, browsers, etc), and converting it into one or more digital formats optimised for digital delivery.

An example of an incompatible digital file is the press-ready PDF that is supplied to an offset printing press. A digital file for printing on paper is a ‘flat’ file that includes data such as printers marks, specific fonts and CMYK colour coding. The file size is usually 10MB or more, even for books printed in black ink only with just text and no graphics. A digital file optimised for screen is a dynamic or interactive file that includes data such as font sets, RGB colour coding, interactivity such as bookmarks and hyperlinks, and embedded metadata (which may include DRM settings). The file size ranges from under 1MB up to about 8MB – long black-and-white text-only books such as novels have a much smaller file size than short books with colour graphics such as children’s stories. File size is important for ebooks which are transmitted and downloaded over the internet and read on battery-operated devices, whereas for press files the most important factor affecting size is print resolution.

Whether converting from analog – ‘hard copy’ – or from an existing digital file, the process involves substantial work in quality control. The controls needed differ slightly for each process.

Digitising hard copy

The usual process for digitising print books is scanning, which involves running a scanner over the physical pages to read text. The two best-known digitising projects are Project Gutenberg and Google Books. Of course, anyone with a scanner can do this, but it is a tedious and time-consuming process.

Scanning is also prone to certain types of error arising from font ligatures, kerning and print quality in the original. Some errors can be found with a spell check but there is no substitute for a human eye familiar with the original text. Without this, the result is wonderful infelicities such as “Sir Leicester is fidgety down at Chesney Wold, with no better company than the goat” instead of ‘gout’ which readers would know is Sir Leicester’s constant companion (as read in the Project Gutenberg edition of Dickens’ Bleak House that is provided on iBooks; the prevalence of ‘goat’ in nearly all ebook versions rather than the original ‘gout’ shows how important it can be to get this right and would prove quite an interesting study in modern textual provenance).

The need for proofing adds to the time and cost of such digitisation. To overcome this, the ingenious reCaptcha system uses humans to interpret scanned text in the context of everyday web use such as filling in forms. This isn’t foolproof: without the original context for the above example, if the font is unclear even humans are more likely to read the familiar word ‘goat’ instead of the less familiar ‘gout’. There is no substitute for a good proofreader checking the text against the original.

The scanning process outputs to a digital file – different scanning software exports to different ranges of format – that can be manipulated in some way, eg by running a spell-check, but it is still not an ebook.

From digital file to ebook

A scanned digital file or a book that is in PDF, Word, InDesign or other digital format still needs to be converted into XML or enhanced PDF to be a usable ebook. Although XML formats are becoming more prevalent, many ebook sellers and libraries still accept PDF, and for reading certain types of content on desktop screens it is even preferred. Almost anyone with a good enough grasp of Adobe Acrobat can perform the necessary enhancements on a PDF such as page cropping, web optimisation, adding a bookmarked table of contents and insertion of metadata.

XML is more difficult. XML is the language on which most ebook formats are based, whether Kindle (.mobi), ePub or DAISY formats. Books in an XML format actually comprise a set of files “zipped” together and packaged with the correct file extension (eg .epub, .mobi). The ease of converting your digital file to XML will depend on how well your book’s structure and content fit into an XML schema, as well as the quality of the original file.

The XML package has several components, the main ones being the marked-up content file, the stylesheet file and the media files (eg images or video). The content file contains the words arranged under tags like <chapter> and <para> to define the structure of your book. The stylesheet (often referred to as ‘CSS’ for ‘cascading style sheet’) defines how those structural elements are presented: size, font, colour, and so on.

If you already use defined sections and styles in your book files, it’s not a great leap from structured InDesign or Word to XML. For example, you might specify that the text style in your book will be Book Antiqua 11pt on 13pt with a first line indent of 1 em space, and you give this style the name Body. As long as each paragraph has the Body style applied to it, a change to the style (say, changing the font to Times New Roman 10pt on 12pt) will be applied globally across all the paragraphs. In ebook production (as distinct from conversion) the Body style would be mapped to the XML tag <para> and the CSS would define how the <para> content will look. This allows you to have one style for your print books and a different style for your ebooks, or to control the style of your ebooks to match (as default styling: many ebook apps allow readers to apply different styles).

Because quality counts

Ebook conversion brings its own set of challenges when it comes to quality control: text, images, navigation and display coding will all need to be checked for errors. Some processes and some source files are more error-prone than others; it can often be easier to code a book in XML from scratch.

Converting from PDF to XML generally produces the worst results, unless competent humans check the output thoroughly and are able to clean up the messy code generated in the process. Images usually need extraction for individual manipulation and optimisation.

Converting from Word or InDesign produces better results, especially with a recent Word or InDesign version that uses XML, but a human still must be involved to check that the mapping process has worked and to find any other coding errors.

Whatever process is used, and whatever type the source file is, quality control is as important when you’re converting to an ebook as when you’re producing a print book.

Finding an ebook conversion service

Finding ebook conversion services is not difficult – a quick Internet search will deliver many results. Finding one you trust and can work with is much harder.

And of course Internet search is a bit random. You may get better results by spending some time looking at ebooks within and across genres. This will give you a good sense of what does and doesn’t work well in an ebook, and also provide a basis for conversations with prospective conversion service providers.

Contact the publishers of ebooks with high production values and find out who handled production. Even if production was done in-house, you can ask about formats and processes – most people are willing to share some of their expertise and experience.

As you continue to explore service providers, be mindful that the cheapest conversion services generally lack quality control and additional services such as distribution. If you are confident that you can handle these aspects yourself, a low-cost service may suit you. If not, you may need to budget for a premium service. A guide to evaluating services is provided below.

Can you go it alone? You may have noticed that cheap and free automated conversion tools are readily available. Don’t be tempted to use these unless you are familiar with both XML coding and CSS, as you will need to spend considerable time cleaning up the XML and CSS code of the output file.

Evaluating conversion services

There is no one-size-fits-all solution to choosing an ebook conversion service. Some services look after the entire process for you, including file validation and upload to your ebook distributor(s). Others simply return your ebook file for you to distribute. Some work closely with you on the style elements, producing tailored designs that work for screen; others reproduce the style of your print book, whether it works well or not.

Here is a brief checklist of things to look for:

  • What file formats are accepted? The best ebook converters will accept a range of common file formats, including InDesign. Be wary of converters who accept only PDF files (even if this is the only format you have) as they may be using automated software. Automated software is fine only with competent humans doing clean-up, proofing and recoding (see next point).
  • Who is responsible for proofing and error-checking? Will any tools be provided to help you in any proofing that is your responsibility? The entire converted ebook must be proofread by the publisher (or a trusted representative) from start to finish at some stage – but ideally this should be done once to pick up any minor errors, not again and again as may be necessary if the conversion team does not have its own checking procedures. Remember that this is not just about checking the words and structure, but also the code – errors in the CSS or the XML may not be obvious until both are rendered on-screen in an e-reader.
  • Are additional services such as file validation offered, or are you on your own once you receive the converted file? Either way, does the service take responsibility for meeting ebook vendors’ file validation requirements?
  • How much consultation is offered? That is, how personal is the service, and is the process presented as a partnership? If your book has complex elements that can’t be preserved or easily rendered in ebook format, will the service suggest alternative ways of presenting those elements?
  • Will all files be returned to you? Most conversion services provide you with the converted ebook files, in the same way printers provide you with the books after printing. However, services offered by some distributors – often free or at very low cost – may not include returning your ebook files to you, instead locking the ebook into their systems. If the conversion service includes maintaining an XML database and managing your ebooks on your behalf, you may not have access to the files. This can be a good thing if you do not have the capacity to manage your new files, at least in the short term. But it’s essential to make the intellectual property arrangements clear, in case the relationship sours or the business circumstances of either party changes.
  • Does the conversion service offer multiple ebook formats? While there are differences between Kindle and ePub formats, the underlying architecture is similar. A good conversion service will offer at least these two formats. A premium service will provide other formats as well, such as DAISY and fixed-layout ePub.

© Linda Kythe Nix 2011. All rights reserved.