So You Think You Want To Index?

I’ve indexed both book projects I’ve been involved with, my solo-authored book, Networked Media, Networked Rhetorics (PSU, 2014) and the co-edited volume Ancient Rhetorics + Digital Networks (Alabama, 2018) with Michele Kennerly. Frankly, I wouldn’t have it any other way–as I put it to someone else considering hiring out their indexing: Who knows your book better than you?

Below, I’ve broken down how I indexed these two books without, well, breaking down. Each time, it took me 3 pretty full workdays–so that’s what you’re signing up for. But, at the end of it, you’ll have a pretty good index! There are a number of decent guides to indexing which are findable through some basic searching, so I won’t cover those here (although searching for “index your book” draws up the auto-suggestion of “never index your own book,” so…)

Basically, my process tries to hybridize computer vision and human vision in order to create an index that is both selective, reflective, and appropriately deflective (with apologies to KB). By using a word frequency program, you can get an accurate reflection of the words used in the book–computer vision is far better than human vision in this case. When I did this for Ancient Rhetorics + Digital Networks, I was surprised to see the following words pop up with some frequency: affordances, animals, beauty, circulation, demos, emergence, encounter, flow…well, you get the idea. A word frequency count can surface themes that you, the person most intimately involved with the book, can’t really perceive, as was the case here–it’s rather amazing to trace these words through the chapters, and I wouldn’t have identified them as indexable themes without the frequency count guiding me. The combination of “distant reading” of the text and your own immersion in the text is what you need to make a good index.

That said, no one has ever come up to me and said, “By the clouds, your index–as I thumbed through it, tears began welling up in my eyes. All beauty shall henceforth be compared to these 9 2 column pages.” I am, however, an index snob, and have pretty strong feelings about what a good and bad index looks like. In other words, get a salt shaker out.

Here’s the process I use:

1. Head over to Voyant, a nice text analysis program that is free and rather useful. Upload your file or files and click “reveal,” which makes the magic happen. Upload a clean file without copyedits or typesetting marks–otherwise you’ll get a lot of gunk in your frequency count (which is not a big deal, but annoying.)

2. You’ll probably want to play around with the results–it’s rather fun to have your text so fundamentally de- and re-constructed. When you’re done with that, you want to export the word frequency list. Click on “Terms,” on the left side of the tool bar below, and you’ll see your word frequency list. Then click on the “export” icon. The export icon appears when you hover over the tool bar–see the question mark just right of center below? That’s where it is.

3. The Export menu will open up, and you’ll want to choose “export current data as tab separated values (text)” under “Export Current Data.” Click “Export.”

4. You’ll be presented with a text file in a window. Select all, then copy it.

5. Open up a new spreadsheet. I’ve used Google Sheets, but no reason Excel wouldn’t work as well. Paste the frequency count you just copied. You can clean up the data a bit at this point (the text file usually gets pasted into rows D-G, but you can move them over easily.

6. Congrats. You’ve done the easy part. I would probably go get a stiff drink before you move on to step 7.

7. What you’ll do next is make the backbone of your index by sifting through the most used words. For my solo-authored book, I looked at the 1500 most frequently used words. For the co-edited volume, I looked at the top 3000. Basically, you need to scan the list until the words you start seeing become commonplace and not helpful for an index. This happened around 1000 and 2500, respectively, and I went 500 words deeper just for good measure. Edited volumes require a deeper dive into the word frequency count because they tend to be more topically diverse.

8. As you are moving through the terms, simply delete the ones that you don’t want to keep. The bias should be toward inclusion rather than deletion, as you can and will modify things later on. Once you have the initial culling of terms done, then go to Add-Ons => Remove Blank Rows => Delete/hide blank rows/columns. This removes all the blank rows from your deletions and gives you a more orderly list of terms.

9. Add in author names [optional.] The two projects that I’ve worked on have wanted author names integrated into the index–any author name in the text of a substantive footnote gets an entry. There isn’t a neat way to do this computationally, as far as I know (although my gosh, comment here if there is!) I don’t know that there’s a faster way than just going through manually and adding author names to the end of the column with the other terms. It doesn’t actually take that much time.

10. Alphabetize your terms. Go to Data => Sort Sheet by Column A, A-Z. There’s your index, add or subtract a few dozen terms!

11. Copy the Sheets column into Word. Copy the column from Sheets, then, in a Word document, go to Edit => Paste Special => Unformatted Text. This should give you a long list of terms without the column formatting.

12. At this point, with a more manageable list of terms, you can start thinking about nesting terms with each other. You can probably reduce the list by 1/3 or 1/2 by grouping terms together, making sub-entries, etc. It’s a flexible process that you’ll keep coming back to as you see new ways of ordering. This is the part of the process that will make you wish you’d hired someone to do your index.

12. …or maybe it is this part? Search for each of the terms in the searchable PDF of the book that your press has provided, and dutifully record the page numbers said term appears on. This is where you will spend the vast majority of your time, and it requires careful attention to detail, and a lot of breaks so you don’t mess up. As you are doing this, two things will happen. First, “multi-word terms” will reveal themselves to you; as you search for one word, you’ll see phrases or multi-word terms that the word frequency count didn’t catch but are important to index. The weakness of the computational method that I’ve outlined is that is looks for single words. In my experience, that has been adequate as a pretty comprehensive starting point, but needs to be supplemented by human vision that is capable of understanding, contextually, what phrases or terms of art deserve a place in the index. While there is a way to mine texts for multi-word terms, I’ve never done it and I’m uncertain how effective it would be, as there are lots of unique phrases in a book that don’t need to be indexed. If anyone has insight on this, let me know in comments! Second, you will come to realize that many of the terms on the initial list are not that substantive, or should be collapsed into other terms. Things will shift around a lot, then they’ll shift back, then back again. It’s just part of the process. Over time, some kind of order will begin to coalesce, and/or you will become so frustrated that you will start saying things like “good enough for who it’s for.”

13. Once you have the index terms linked to page numbers, there is one more step: manually go through every page of the volume and look for panoramic themes/terms you might have missed when you were looking at the trees. This is kind of the catch-all step, and I’m happy to say that in my experience through two books, there are very few additional terms that manifested in this stage, but I think it is just due diligence to take this final step.

14. Profit. Hopefully your royalties will cover the amount of coffee it cost you to make the index.

No doubt this is a slightly idiosyncratic approach. I’d love to hear different strategies in comments or via email.