Are books data?


The arguments

Some people out there will tell you that books are data and others will tell you that they simply aren't. Take for example the following three articles.

1. A Publisher’s Job Is to Provide a Good API for Books: You can start with your index (Hugh McGuire)

Early last year this article by Hugh McGuire set in motion an excited discussion across the Internet about the possibility of looking at the index as an API and the book as data.  

2. If Book Then . . . what now? Books face the future (Frank Rose)

A couple of months later Frank Rose's article appeared, which reported on the IfBookThen conference, where the argument very much centred around books as tech, data, and opportunities to hyperlink to other places.

3. The unXMLing of digital books (Liza Daly)

It seems strange that after all this excitement around the book as data, which the above two articles reflect, that we are back at calling the book a blob and being told that we shouldn't worry about structuring books too strictly, as Liza Daly does in her article.

So what's the truth?

The truth is that each one of these people is thinking about a different type of book at a different point in time. Liza Daly is thinking about the book right now and how to do things like transform WordPress websites into books. Whereas people thinking about books as data have their minds in the near future and are thinking less about transforming blogs or even popular fiction into books, but are thinking more about a wider range of books from fiction to reference material and everything between.

All of these people are right in their own way and although Liza Daly's position is contrary to others, it is thinking similar to hers that enabled Amazon to take control of ebooks.

By keeping things simple with Kindle and not investing time and energy into supporting the complexities of EPUB, Amazon was able to race ahead and the company now has the advantage of being way ahead in the lead. While we cling to the fact that EPUB is a better technology, the Kindle is a better experience and has more titles available. Hence, the latter is winning.

Similar truths apply here. Thinking of the book as a blob enables us to push ahead, publish everything and get on with making money. While thinking of the book as data provides us with cleaner, future-proof books that can be transformed into future formats more easily, but which take longer to produce and are more expensive to create.

And what does this mean for the future?

The "data" concept remains the ideal while the "blob" concept gives us licence to get things done. The catch is that the existence of the blob concept slows the adoption of the data concept, but really that's always been the case in publishing: everyone has talked about XML and XSLT workflows, and before that SGML, for as long as I can remember but the easiest way to get things done is by importing Word docs into InDesign.

The availability of an easy option has slowed the adoption of the slightly more complex XML option, even though it provides greater flexibility and future-proofing possibilities. And this is tied to human nature. There will be those who make giant leaps ahead of the rest, but for the everyday money earners the blob concept keeps things ticking over.

Things that are still bothering about the blob

If we go as far as loosening the strings on the markup of ebook content (as suggest by Liza Daly) then this potentially causes problems for app makers (and ereaders). The reason for this concern is because there are already well-established tools for parsing and processing XML within the major programming and scripting languages, and not only can these be used to validate content, but they can also be used to extend what can be done with it. If the content is lazy and imprecise then it restricts how far the developer can utilise these tools, or it means that apps must clean the content and guess at certain conventions before advancing its utility.

Yes, right now displaying the blob in what is little more than a web-browser might be just about fine, but to make things faster and more useful apps need to plug into the core level of an operating system. And the fewer barriers and less potential headaches content creators add surely the better. Besides, back-pedalling at this stage, just as EPUB is heading towards things like the advanced handling of indexes seems a little crazy to say the least.

Further, although it might sound old-fashioned, books are meant to be laboured over and they give back all that we put into them. If we spend time on books, we pay attention to the details and it is these details that matter. Personally, I'm not interested in reading blogs that are, at the tap of a button, transformed into books. For me that is not a book, the content of a blog needs to be reappraised and re-examined if it is to become a book.

Of course, it might be that others are shedding tears over these untapped blogs but my personal view is that progress should not be held up (or even reversed) because of blogs (or material similar to them). Blogs are what they are, a record of the now, and that's what makes them brilliant, but packaged into a book they would, in the majority, look quite feeble.

To conclude

While this post began with a level head balancing the "blob" and the "data" perspectives, ultimately it is difficult to sit on the fence in this debate. But deciding which side of the fence is yours is not straightforward. It might be that your heart votes "data" and your head votes "blob", or that, as is the case with my own beliefs, that my heart votes data and my head is undecided.

Whichever is the case for you, it is certain that our differences will mean that the blob/data debate will likely never end, and we'll keep taking steps forwards and backwards never going as fast as we thought we would.

Endorse on Coderwall

Comments