What is a Corpus?
A corpus refers to the collection of texts/documents you wish to analyze. It is good to include each individual text as its own file, rather than one file including the text of every document together.
Things to Consider when your Corpus is a single book
When building your corpus, think about what parts of the whole you want to analyze. Using a resource such as Voyant, you are able to apply text analysis tools to the full corpus, but also select different parts to analyze individually. While this is standard when analyzing separate works, consider singular works which you may break up in to smaller parts.
Let's look at L. Frank Baum's The Wonderful Wizard of Oz. Below are 1. A plain text file of the entire book, and 2. A Zip folder containing a .txt file for each chapter.
By putting each chapter in its own file, you are able to analyze a specific chapter in the context of the full text.
Analysis sample for full text in one file:
Analysis sample for full text by individual chapters.
Notice that in the second sample, each chapter is selectable for analysis.