Home » Blog

Indexing the Web, One Book at a Time

By on March 10, 2011No Comment

If you have ever wondered how much work it is for Google to index the Web, try a simpler project, namely indexing a book.

Over the past few days, along with Basex analyst Cody Burke, I worked on creating the index for my forthcoming book, Overload! How Too Much Information Is Hazardous To Your Organization.

Indexing is hard work. My brother Greg Spira, a veteran of numerous books who is far more knowledgeable about the publishing world than I am, said, more or less, that publishers hate doing indices, most aren’t done terribly well, and they are costly to do properly.

If you think it’s as simple as using a word processor’s index generation functionality, you haven’t created an index recently. What that type of software will do is create a concordance (essentially the alphabetical index of words in a book), which isn’t the same thing.

A high-quality index provides readers with quick access to information and improves the book’s overall usability. It anticipates what a reader would want to search for and surfaces relevant information. It should therefore accurately reflect the book’s contents and ideas, but without overburdening the reader with excess information.

In addition to cross-referencing terms and connecting them, an index also needs to drill down on frequently used terms to indicate in what varying contexts they are used. For example, the term productivity had to be broken up into eight sub-categories, such as “defined”, “and e-mail”, and “of labor.” Other times, an entry in the index is more conceptual, such as “Time, as a resource.” Those exact terms may not appear in the text, but the reader needs to be able to find where that topic is discussed.

Book indices are somewhat taken for granted, although no doubt you have found one useful in your own reading and research. Indeed, if you’ve never really thought about the index in the back of a book until now, except to use it, this may be food for thought.

An index is an ontology, a hierarchical structure of knowledge, and building that ontology is no mean feat: it requires thinking about information, be it a book or the entire Internet, in a nonlinear fashion.

As I worked on the index with Cody, it occurred to me the enormous task that developers of search engines have. They essentially create software that does to the Internet what we were doing with a single book.

Consider the following words, written in 1465 about a book’s index:

“The index and figures of this book are indeed alone worth its whole price, because they make it much easier to use… so that everybody who wants to quickly find something that is contained in this little book can find it.”

Plus ça change, plus c’est la même chose

Jonathan B. Spira is CEO and Chief Analyst at Basex.

You can pre-order your copy of Overload! on amazon.com.

Leave a comment!

You must be logged in to post a comment.