Описание:Corpus of Contemporary American English (COCA), n-grams data (2, 3, 4-word sequences, with their frequency). These n-grams are based on the largest publicly-available, genre-balanced corpus of English -- the 450 million word Corpus of Contemporary American English (COCA). With this n-grams data (2, 3, 4-word sequences, with their frequency), you can carry out powerful queries offline -- without needing to access the corpus via the web interface. A few examples (from among an unlimited number of searches) might be: Each of the following free n-grams file contains the (approximately) 1,000,000 most frequent n-grams from the Corpus of Contemporary American English (COCA). In order to download these files, you will first need to input your name and email. Case sensitive means that e.g. Bush and bush are separate entries. The n-grams with parts of speech allow you to find (for example) all of the tens of thousands of NOUN + NOUN sequences, or any other search that refers to the part of speech of the word. For help with the part of speech tags, click here.
