Linguistic typology and learnability in second language

A research project funded by the Leverhulme Trust


Our multilingual capacity is central to human cognition. Adult second language learning, in particular, sheds light on learning across the lifespan with far-reaching consequences for language learning practice as well as intercultural communication. This project investigates importantly how the mother tongue (L1) influences second language learning (L2) by looking at the linguistic distance between L1 and L2, whether L1 and L2 belong to the same language family and share typological features or whether they are typologically unrelated. It does so innovatively through the use of big data from online language learning, namely by looking at learner progress in English as a second language (L2 English) through the writings of thousands of learners from all around the world. Large scale L2 English data from online language learning provides an empirical bridge between developmental lab-based second language acquisition research and language proficiency in education. The research will help educators target the language background of learners with benefits particularly for learners from linguistically distant languages (e.g. Chinese, Japanese), at a time when language skills are becoming increasingly important [1].

We propose to exploit the recently developed Parametric Comparison Method (PCM) [2,4] from the subfield of linguistic typology and geneaology to provide a typological framework for the investigation of linguistic distance and its impact on L2. We will investigate 10 typologically diverse mother-tongues (L1s) such as Brazilian-Portuguese, Russian, Chinese and Arabic for thousands of language learners in the EF-Cambridge Open Language Database (EFCAMDAT) [10], an open access corpus developed at Cambridge. EFCAMDAT consists of L2 writings submitted to the online school of EF Education First, an international school of English as a foreign language. Available at http://corpus.mml.cam.ac.uk/efcamdat, EFCAMDAT is the largest open access corpus of its kind, with 1.2 million scripts summing 71.8 million words, and continues to grow. Our objective is to incorporate the impact of linguistic distance into current L2 learnability theories.






Research Questions

Learner Profiles

  1. What is the impact of linguistic distance between L1 and L2 on L2 learning?
  2. Do typological L1 properties affect the acquisition of individual features, e.g. definiteness, affected by further differences/similarities between L1 and L2 (e.g. number marking, mass/noun distinction)?
  3. How are features that cannot be transferred from L1 acquired? Are there unlearnable features?
  4. Which types of features show persistent L1 effects even at advanced proficiency and why?
  5. Can we model typologies of L2 grammars on the basis of L1 typologies? How can we incorporate such L2 typologies to current theories of L2 learnability?


To measure L1-L2 distance, we adopt the Parametric Comparison Method (PCM) [2-4]. Following the Principles and Parameters framework [15], PMC uses binary parameters to model cross-linguistic variation. Linguistic distance is measured through (the co-efficient of) identities and differences in parameter values. PMC yields measures refined enough to differentiate between as many as 28 languages and successfully distinguish between language genealogies [4]. Importantly, PMC defines implicational relations between parameters allowing us to capture clusters of superficial properties under few parameters.

We will use the PCM to characterise 10 typologically diverse L1s (Spanish, French, Italian, Brazilian, German, Chinese, Russian, Japanese, Korean, Arabic, Turkish). For instance, based on 63 (nominal) parameters, the PMC distance between English and German is 0,1111 (40 identical parameter settings:5 differences, 18 not applicable), with Spanish 0,1818 (36:8), Russian 0,3143 (24:11) and Arabic 0,4048 (24, 16). We will document the impact of PMC distance on individual features in morphology and syntax (e.g. article, verbal endings, subcategorisation frames etc.), syntactic complexity and the interface with semantics and discourse (aspectual distinctions, anaphoric marking). A variety of parameters will be used, mega, meso and micro [16].

To obtain a dataset rich enough for an investigation of typological effects across proficiency with significant learner numbers we will use EFCAMDAT. EFCAMDAT contains 128 distinct tasks across the proficiency spectrum drawing from learners across 170 nationalities. The empirical research will consist of a comprehensive corpus analysis exploiting EFCAMDAT (and smaller corpora like ICLE [17]). Psycholinguistic experiments will distinguish avoidance strategies [18,19] or possible data sparseness in production from true unlearnability. Outputs: The main output will be a book synthesizing the typological perspective with current learnability hypotheses, presenting an empirical investigation of unique scope.


  • Chen, X., Alexopoulou, T., & Tsimpli, I. (2019). L1 effects on the development of L2 subordination. Poster presented at the 29th conference of the European Second Language Association (EuroSLA 29). Lund, Aug. 28-31.
  • Chen, X., Alexopoulou, T., & Tsimpli, I. (in prep). Automatic extraction of subordinate clauses and its application in second language acquisition research.
  • Chen, X., (in prep). Microservices and intelligent language tutoring systems: Implementing a subordinate clause exercise generator with microservice architecture.


[1] Born Global, 2014, A British Academy Project on Languages and Employability, interim reporthttp://www.britac.ac.uk/born-global

[2] Longobardi and Guardiano, 2009, Evidence for syntax as a signal of historical relatedness, Lingua, 119.1679- 1706.

[3] Publications from the ERC grant Meeting Darwin's last challenge: toward a global tree of human languages and genes (LanGeLin) (2012-2017) to Professor G. Longobardy, Univ. of York https://www.york.ac.uk/language/research/projects/langelin/#tab-3

[4] Guardiano and Longobardi, 2017, Parametric comparison. In Ian Roberts (ed.), The Oxford Handbook of Universal Grammar. Oxford: Oxford University Press.

[5] Foyle and Flynn, 2013, The role of native language, in The Cambridge Handbook of Second Language Acquisition, p. 94-113.

[6] Jarvis and Pavlenko, 2007, Crosslinguistic Influence in language and cogntion, New York Routledge.

[7] Rankin, 2011, The transfer of V2: inversion and negation in German and Dutch learners of English, Internationl Journal of Bilingualism, 16.1.139-158.

[8] Ionin and Montrul, 2010, The role of L1 transfer in the interpretation of articles with definite plurals in L2 English, Language Learning , 60.4.877-925.

[9] Murakami and Alexopoulou, 2015, L1 influence on the acquisiton order of the English Grammatical Morphemes: a learner corpus study, Studies in Second Language Acquisition, 38.3.365-401.

[10] Geertzen, Alexopoulou, Korhonen, 2013, Automatic linguistic annotation of large scale L2 databases: The EF- Cambridge Open Language Database (EFCAMDAT), Proceedings of the 31 st Second Language Research Forum, Cascadilla Press, 240-254.

[11] Hawkins, 2007, Do second language learners acquire restrictive relative clauses on the basis of relational or configurational information? The acquisition of French subject, indirect object and genitive restrictive relative clauses by second language learners, Second Language Research, 156-188.

[12] Slabakova, 2009, What is Easy and What is Hard to acquire in Second Language?, Proc. Of the GASLA 2009, Cascadilla Press, 280-294.

[13] Dekydtspotter and Renaud, 2014, On second language processing and grammatical development; The parser in second language acquisition, Linguistic Approaches to Bilingualism, 4(2), 131-165.

[14] Tsimpli and Dimitrakopoulou, 2007, The interpretability Hypothesis: evidence from wh-interrogatives in second language acquisition, Second Language Research, 23.2.215-242.

[15] Chomsky and Lasnik, 1993, Principles and Parameters Theory, in Syntax: An International Handbook of Contemporary Research, Berlin: de Gruyter.

[16] Biberauer and Roberts, 2015, The Clausal Hierarchy, Features and Parameters. In U. Shlonsky (ed) Beyond Functional Sequence: The Cartography of Syntactic Structures, Volume 10. Oxford: Oxford University Press, 295- 313.

[17] Granger, Dagneaux and Meunier, 2002, International Corpus of Learner English, Presses Universitaires de Louvain.

[18] Schachter, 1974, An error in error analysis, Language Learning, 24.205-214.

[19] Ventura and Myles, 2015, The importance of task variability in the design of learner corpora for SLA research, International Journal of Learner Corpus Research, 1.1.58-95.


This project is funded by the Leverhulme Trust Research Project Grant "Linguistic typology and learnability in second language".

We collaborate with: EF Education First