Natural Language Annotation for Machine Learning

This book is intended as a resource for people who are interested in using computers to help process natural language. A "natural language" refers to any language spoken by humans, either currently (e.g., English, Chinese, Spanish) or in the past (e.g., Latin, Greek, Sankrit). “Annotation” refers to the process of adding metadata information to the text in order to augment a computer’s abilities to perform Natural Language Processing (NLP). In particular, we examine how information can be added to natural language text through annotation in order to increase the performance of machine learning algorithms—computer programs designed to extrapolate rules from the information provided over texts in order to apply those rules to unannotated texts later on.