Frequently Asked Questions

What is Onolytics?

Onolytics is a research-based methodology that classifies the ethnic origins of database populations using each individual’s first name and surname. This methodology is probabilistic, not deterministic, and is accessible through a software tool.

What is the science behind Onolytics?

Onolytics is based on over a decade of academic research by Prof. Pablo Mateos into the ethnic origin of personal names.  It was the subject of his Ph.D. dissertation, and is backed by years of data analysis across the world, as applied within a variety of disciplines in academic, public and private sectors. Onolytics’ key discovery lies in detecting cliques of names by automatically applying network clustering techniques to the linkages between first names and surnames in populations across the world. It does so based on whole population registers such as electoral registers, telephone directories, historical census records, name frequency statistics from civil registers and publically available datasets (such as the US Social Security database).

How was Onolytics developed?

Onolytics is the brainchild of Prof. Pablo Mateos.  It began as his Ph.D. research in 2004, first delivered as a PhD thesis in 2007 and subsequently published in a book monograph in 2014. Ongoing research informs and augments the Onolytics algorithm as new data and patterns are uncovered.

To read more about how the Onolytics methodology was developed (formerly known as Onomap) you can read key texts:

  • Mateos, Webber and Longley (2007) The Cultural, Ethnic and Linguistic Classification of Populations and Neighbourhoods using Personal Names, CASA Working Paper 116, Centre for Advanced Spatial Analysis, University College London. (Full paper)
  • Mateos, P. (2007) A review of name-based ethnicity classification methods and their potential in population studies, Population Space and Place, 13 (4): 243-263 (Full paper)
  • Mateos, P.; (2007) An ontology of ethnicity based upon personal names: with implications for neighbourhood profiling. Doctoral thesis, University of London (Phd Thesis)
  • Mateos, P., Longley, P.A. and O’Sullivan, D. (2011) Ethnicity and Population Structure in Personal Naming Networks. PloS ONE (Public Library of Science) 6 (9) e22943 [article]
  • Mateos, Pablo (2014) “Names, Ethnicity and Populations; Tracing Identity in Space”. Springer: Heidelberg (Springer    Amazon   Barnes&Noble)

How does Onolytics methodology work?

The Onolytics methodology is based on a new ontology of ethnicity that combines some of the multidimensional facets encapsulated in the diversity of people’s names: language, religion, geographical region, and culture. It is a methodology developed using data collected at very fine temporal and spatial scales, and made available, subject to safeguards, at the level of the individual. The dataset is classified assigning the most probable cultural ethnic and linguistic (CEL) group of origin to each name, termed Onolytics Types. Once the user processes a database of individuals names through Onolytics software, the algorithm establishes the most probable origin of each person based on the origins of his/her first name and surname following a set of complex rules and probability rates.

How is Onolytics useful to my organization?

There is a growing need to understand the nature and detailed composition of ethnic groups in today’s increasingly multicultural societies. Ethnicity classifications are personally sensitive and  can be the subject of public controversy. People may not feel comfortable self-reporting their ethnicity and, therefore, answers are prone to interviewer or respondent bias. Poor quality and/or a lack of availability of ethnicity classifications in routine administrative records and transactional datasets has serious consequences in terms of failing to meaningfully understand the diversity within populations.

Onolytics allows you to classify your populations (i.e. users, citizens, patients, or customers) using an academically validated methodology with sufficient accuracy and at a fraction of the cost and hassle of alternative methods (such as, asking for self-assigned ethnicity in surveys or forms or attempting to re-contact people in historical records). Furthermore, once you create a baseline measurement of the diversity and equity in access to your services, you can consistently monitor progress against a quantifiable such baseline, using a neutral and defensible methodology that is not skewed by changes in data collection or the subjective judgment of the intermediaries involved.

What is Onolytics Taxonomy?

Onolytics classifies names into groups of common cultural ethnic and linguistic (CEL) origin using first names and surnames. Onolytics classification is organized in a hierarchical pyramid with three levels of detail of ethnic groups, termed the Onolytics Taxonomy.

  • At the base of such hierarchy there are a total of 185 independently assigned categories termed Onolytics Types, which represent the smallest building blocks of the Onolytics Taxonomy.
  • These are then organized in 66 Onolytics Subgroups, which
  • Then nest together into 16 Onolytics Groups.

You can tailor this hierarchy creating your own aggregations of Onolytics Types into more meaningful groupings depending on the specific characteristics of your application.

How is Onolytics delivered?

Onolytics is currently available as standalone software for maximum privacy. Furthermore, an API integration is currently under development.

Onolytics is a JAVA-based software program, and requires that the JAVA runtime environment be installed on the client machine.  As such, it is Operating System independent and runs in Windows, Mac and Linux.  If your client machine can run JAVA, you can run Onolytics.

Onolytics is also available as a consultancy service.  You provide us with the data and we deliver the results.

Why is understanding ethnicity important for my business?

Identifying the key characteristics of your target population is critical to business success.  Without appropriately identifying your target market/s (where are they, how do they behave, what kind of preferences they have, how can you reach them, etc.), businesses risk making critical mistakes that fail to appropriately reach their potential markets.

Example: Imagine a business decision maker who wants to sell a new product to a target market.

S/he would most likely define the target market for that product in terms of age, gender, geographical location, education level, family composition and socioeconomic status. But what about ethnicity? This added layer of understanding encompases cultural aspects such as language, religion, country of origin of the family, but also administrative aspects such as migration history, nationality, places of socialization, degree of assimilation, etc.

Problem: How would you add ethnicity data to your current and potential customer databases so that your marketing is appropriately targeted, and products reach their intended market?

Obstacle: Without spending a great deal of time and resources tracking down the ethnic origin of your current and/or potential customers, you put your business at risk of not reaching the people who are ultimately more likely to buy your products or services, or who may be more profitable for your line of business.

Solution:  Onolytics has taken the guesswork out of identifying the ethnic origin of your current and target markets. Using our simple tools, you can add ethnicity data to a list of names with an extremely high degree of accuracy, validated through numerous external academic evaluations.

Added Value:  You can use the ethnicity makeup of your current and potential customers in various areas of your business:  from product development, marketing, retail planning, sales support, finance and payments, etc. Onolytics supports business decisions with confidence and in a consistent way over time. This adds accuracy to your planning and reduces critical time-to-market and opportunity costs.

Whether you are a large or small business, Onolytics provides critical data enrichment services, saves you time and money and adds,  substantial value to your business.

I am in the research sector in academia, government, or non-profits, How does Onolytics benefit my organization?

Onolytics was originally developed with academic and government users in mind, although the science behind Onolytics is applicable to a wide variety of industries.

With a strong presence in research across academic disciplines and government services, Onolytics has provided critical data to agencies and organizations across the world who require identifying ethnic inequalities on existing data in fields such as:

  • Epidemiology, Public Health and Healthcare
  • Genetic research
  • Computer Science and Social Media
  • Economics, Entrepreneurship and Management
  • Political Science and Elections
  • Education
  • Housing
  • Labor and recruitment discrimination

To read about some of these academic applications see over 40 publications mentioning or validating Onolytics (formerly known as Onomap)

How accurate is Onolytics?

The science studying names is called Onomastics, hence the root of our name Onolytics. Name analysis to classify the ethnic origin of people’s names has been carried out since at least the late 19th century, starting with George Darwin, the son of Charles Darwin who used surnames as an indicator of endogamy in the English aristocracy. Later, the U.S. Census began publishing research using Spanish Surnames in the 1940s and various researchers in the U.S., U.K., Germany the Netherlands and other countries have consistently used name analysis to ascribe ethnic origin for populations coming from different parts of the world. Most of these studies only used surnames to ascribe a person’s ethnic origin, and not first names.

Onolytics went beyond this research frontier and added first names origins in combination with surnames and a score that represents how likely a name is to originate from a certain ethnic group. This combination (first name and surname) together with a set of complex rules bundled in a proprietary algorithm, makes Onolytics classification very powerful.

Over 40 academic research projects have applied Onolytics and in some cases validated its accuracy against known ethnicity (self-reported). To read more about some of these external academic applications see over 40 publications mentioning or validating Onolytics (formerly known as Onomap)

Aren’t names more or less random?

Not at all!

Most of us acquired our surnames and first names from our immediate ancestors, either passed down to us over generations or chosen by our parents in ways that are by no means random. Linguistic, religious, regional, cultural and legal factors all shape the ways in which our names are chosen and transmitted over time and across space. Intriguingly, naming conventions usually adhere to unwritten social norms and customs that with time end up producing distinctive ethnic and geographic patterns in name frequency distributions over space. A sort of “name sediment” accretes over time that can be very distinctive of particular places, only altered by migration flows and inter-group marriage between different human groups.

Furthermore, these mostly exceptional events can be disentangled in contemporary name distributions and sometimes traced back to their areas of origin. Prof. Pablo Mateos published a book compiling evidence on these patterns assembled from fields as diverse as linguistics, genetics, epidemiology, economics, geography, demography, sociology, anthropology, psychology, history, genealogy, physics, and computer science. This evidence is woven together into an innovative account of how personal name frequency distributions over space and time follow a set of regularities across societies that have hitherto not been studied from a joint, social science perspective on human difference over space. These are precisely the patterns that Onolytics helps you uncover.

How can I find out more about the science behind Onolytics?

Prof. Mateos 2014 book, Names, Ethnicity and Populations: Tracing Identity in Space (Advances in Spatial Science) is available on Amazon , Springer  or  Barnes&Noble

You can also read some other key texts explaining the science behind Onolytics:

  • Mateos, Webber and Longley (2007) The Cultural, Ethnic and Linguistic Classification of Populations and Neighbourhoods using Personal Names, CASA Working Paper 116, Centre for Advanced Spatial Analysis, University College London. (Full paper)
  • Mateos, P. (2007) A review of name-based ethnicity classification methods and their potential in population studies, Population Space and Place, 13 (4): 243-263 (Abstract) (Full paper)
  • Mateos, P.; (2007) An ontology of ethnicity based upon personal names: with implications for neighbourhood profiling. Doctoral thesis, University of London (Phd Thesis)
  • Mateos, P., Longley, P.A. and O’Sullivan, D. (2011) Ethnicity and Population Structure in Personal Naming Networks. PloS ONE (Public Library of Science) 6 (9) e22943 [article]

What is Prof. Mateos’ background?

Prof. Pablo Mateos is Associate Professor at the Centre for Research and Advanced Studies in Social Anthropology (CIESAS), Mexico. He was Lecturer in Human Geography at the Department of Geography, University College London (UCL) in the United Kingdom (2008–2012). He obtained a PhD in Social Geography at the University of London (2007). At UCL he was a member of the Migration Research Unit (MRU), an associate of the Centre for Advanced Spatial Analysis (CASA) and Research Fellow at the Centre for Research and Analysis of Migration (CReAM). He is a member of the Mexican National System of Researchers (SNI Level III) and a member of the Population Association of America (PAA), American Association of Geographers (AAG), and past member of the Royal Geographical Society, Royal Statistical Society, and British Society of Population Studies, and European Population Association. He is a member of the UK Economic and Social Research (ESRC) Peer Review College, and a member of the editorial board of various international journal in Social Sciences. His research interests lie at the intersections of Social, Urban and Population Geography and his work focuses on investigating ethnicity, identity, migration, citizenship and urban segregation primarily in the UK, Spain, US, Mexico and Latin America. He has published over 40 journal articles and book chapters, amongst others; PLoS ONE, Journal of Ethnic and Migration Studies, Journal of Urban Affairs, Geoforum, Human Biology, and Population Space and Place.

How can I get a copy of Onolytics?

Please visit the ‘Contact Us’ page and we’ll be in touch with you shortly!

In the meantime, you can also test Onolytics but filling out individual names on our home page (navigate to the form on the right side) or testing a batch file of 100 names.

Does Onolytics give race as opposed to ethnicity? Or do you have a guide on how your ethnicity categorisations map to race?

Onolytics classifies a person’s name according to their cultural, linguistic, geographical or religious origin of their names. In short we call these properties of names “ethnicity” as a multidimensional characteristic of our origin that is still reflected in name patterns. However, ethnicity is not the same as race, even though both are socially constructed categories and sometimes used interchangeably. Race is a term much more prevalent in the Americas and Asia than in Europe. It denotes certain biological characteristics and phenotypical traits of a person such as skin color, hair type, body shape and facial features. Obviously these features are not reflected on names or language, so they cannot directly be traced from names.

However, some rough equivalences can be established between some Onolytics categories (which we label as Types) to the American concept of “race”. For example, Black African and Black Caribbean names to “Black”, Arab and Muslim names to “Middle East”, South Asian, Chinese and other Asian ethnic groups to “Asian”, Spanish and Latin American names to “Hispanic” and all European groups to “Whites”. It is not a perfect alignment but it has proved to work in most settings at an aggregate level.