Speakers in native-language conversations converge towards each other’s speaking behaviour. The LUCEA corpus (Longitudinal University College utrecht Corpus of English Accents) was collected to study this type of phonetic convergence in a multilingual environment. Students and teachers at University College Utrecht (UCU) come from various countries and native languages, yet they all use English as the lingua franca on campus. Hence, phonetic convergence may result in a unique international version of English, influenced by the speakers’ native languages and accents. Such development of an international English accent may also occur in international finance, in military alliances, and in other multilingual teams communicating in English.

The LUCEA corpus consists of high-quality recordings from 4 consecutive cohorts of students (2010 to 2013). Students were followed longitudinally, with up to 5 recordings through their 3-year study at UCU. Each interview consists of read speech and spontaneous monologues in the participants’ native language (L1) and in English, plus a brief dialogue with the interviewer in English. Recordings were conducted in a quiet office, using a headset microphone and an array of microphones standing around the participant.

The corpus contains data from about 1095 interviews from 283 unique students (of whom 139 students have contributed all five interviews). Each interview contains about 20 minutes of speech. The speech corpus is augmented with participants’ responses from entry and exit questionnaires, and supplementary data about the participants and about each recording. The total corpus contains about 3 TB (about 3000 GB) of audio data.

The project has been funded by University College Utrecht, by the Utrecht institute of Linguistics OTS, and by Clarin-NL.

For more information, see the presentations and papers about this corpus, or contact the principal investigators.