Large-scale diversity estimation through surname origin inference

In 2018, Antoine Mazières and Camille Roth published in Bulletin of Sociological Methodology the article “Large-Scale Diversity Estimation Through Surname Origin Inference”. Recently, Antoine wrote an informal debriefing (in french) of the study, which gives us the chance to make a post on this site.

The abstract of the article is as follow:
The study of surnames as both linguistic and geographical markers of the past has proven valuable in several research fields spanning from biology and genetics to demography and social mobility. This article builds upon the existing literature to conceive and develop a surname origin classifier based on a data-driven typology. This enables us to explore a methodology to describe large-scale estimates of the relative diversity of social groups, especially when such data is scarcely available. We subsequently analyze the representativeness of surname origins for 15 socio-professional groups in France.