Loading...
Skip to content

Article

Tools @ Bookhouse

Creating an index from a list of words

Bookhouse’s in-house software for index creation can come in handy for books that do not have the time or budget for a human-made index.

The software works by searching the book for curated phrases and then tracking their location in text. For example, with the input book publishing = publishing, books, the software will search the book for every occurrence of the phrase ‘book publishing’ and create an index entry under ‘books, publishing’.

It goes something like this:

    text to search for = index entry
    

Example of entries that could be used by the software:

    systemic inflammation = inflammation, systemic
low-density lipoprotein = lipoproteins, low-density
LDL = see low-density lipoprotein
Thea Astley = Astley, Thea
Astley = Astley, Thea
low-density lipoprotein

It is a simple syntax, but incredibly versatile. Note in the example above how we have created two index entries for ‘low-density lipoprotein’. One of them appears under ‘lipoproteins, low-density’ and the other under ‘low-density lipoprotein’. There are also two entries for author, Thea Astley, using different search phrases. There is no limit to the number of search patterns that can be associated with an index entry. As shown, see and see also entries are possible.

The list of search phrases does not have to be sorted alphabetically – this is handled automatically. This allows the person assembling the search phrases to read through the book as they normally would, adding to the list as they encounter relevant text. It does not matter if the same search phrase appears in the list multiple times.

One of the superb features of these indexes is that the page numbers are dynamic. If the text moves around during the proofing process, the numbers will automatically update. This means that the list of search phrases can be created at any time during the production process. It is possible, for instance, to task authors with the job of creating the list of search phrases while the proofreader is working in parallel. In fact, it would be possible to get them to do this even earlier in the process, such as during copyediting. It does not have to wait until the last moment like a normal index.

The complexity and depth of the index is very much dependent on the list of search phrases that is provided. A useful technique is to collaborate on the search terms using shared documents, such as Google Docs. This allows the author, editor, copyeditor and publisher to all contribute to the same list of search phrases. The more people providing input to the list, the more detailed the index will be.

In general, search phrases will be two or more words. Since the search phrases are literal, two or more words will prevent overlap of concepts. Single-word entries are certainly possible and make sense in the case of proper nouns, but they should be distinct within the context of the book. For instance, our inclusion of ‘Astley’ in the examples above would not be possible if Rick Astley and Thea Astley were mentioned in the same book.

Number spans are automatically contracted (elision). This means that no editorial intervention is required to format numbers from ‘145–146’ to ‘145–6’. Special circumstances, such as numbers 10–19, are also automatically handled.

Proof and refine

A typeset index can be created and recreated at any time during production. This enables an editor to check on the characteristics of the index long before final proofs. It is possible to determine the number of pages that should be left for the index or to examine shortcomings in the search phrases. Generating a sample index early in the process can be useful to provide feedback to the author, encouraging them to revisit the text with a fresh perspective on the type of search phrases they can include.


Bookhouse’s index creation software is not intended to be a replacement for a human-made index, such as those you would get from a dedicated service such as Puddingburn. There is simply no substitute for the quality of such an index. The software works well in situations where there is no budget for that kind of index, but the book content would benefit from having one.

Index created from a list of search phrases

This index was generated entirely from a list of search phrases provided by the author. Note the inclusion of references to picture captions.