Subcorpus from text types


Hi, this is Ondřej from Sketch Engine and
in this screencast I will show you how to divide a corpus into smaller parts called
subcorpora. Each corpus can be divided into an unlimited
number of smaller parts called subcorpora. Subcorpora can be analysed separately or contrasted with each other. Creating a subcorpus does NOT use any storage space for user corpora. A subcorpus can be created in three different
ways: From a concordance, from text types or from a definition file. This screencast demonstrates the second option
which creates a subcorpus from a combination of text types, also called metadata or annotation. For example, if the documents in your corpus
are annotated for publication date and the country of origin, you can build a sub corpus from all documents published in Canada in 2013. To build a subcorpus from text types, go to
the corpus management. Alternatively, the advanced tab of each tool features a subcorpus selector along with the new subcorpus button. Name your subcorpus and select the text types. and click CREATE SUBCORPUS Now your subcorpus is ready. It is not possible to add additional data to the subcorpus once it has been created. Look for the subcorpus selector found on the advanced tab of each tool. Selecting the subcorpus will restrict the analysis only to this subcorpus. To see the list of subcorpora,
go to the corpus information page or to the corpus management. The number of tokens is the exact number, the number of words is only an estimate. The percentage indicates the part of the whole corpus taken up by the subcorpus. Use this icon to delete the subcorpus. Deleting a subcorpus will NOT delete any data from the corpus. Please look at the other videos showing more options for building a subcorpus. To try Sketch Engine, register for a free
trial on sketchengine.eu Thank you for watching and don’t forget to Subscribe. Use the comments for any questions you might have.

Leave a Reply

Your email address will not be published. Required fields are marked *