„The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written.
„The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text.
The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins.“
Das BNC wurde in den 1990er-Jahren zusammengestellt.
Die Uni Lancaster hat ein neues Korpus-Projekt gestartet, das aktuellere Sprachdaten umfasst: das „British National Corpus 2014″.
„The British National Corpus 2014 is a large collection of samples of contemporary British English language use, gathered from a range of real-life contexts. The BNC2014, which contains millions of words of spoken and written English, is being gathered by Lancaster University and Cambridge University Press, and is a new resource for research and teaching on contemporary British English. It is the successor to the original British National Corpus, which was gathered in the early 1990s. By comparing the two corpora, researchers will be able to shed light on how British English may have changed over the last two decades.“
Über die „spoken“-Komponente des BNC2014 informiert z.B. dieser Post im Korpuslinguistk-Blog Around the world.