Ethnicity Estimator is an online service which allows users to produce an estimated ethnicity distribution of a set of names supplied to it, based on the standard UK ONS ethnicity category groups. Upon supplying a CSV of names, it will return an indicative population count, split by the categories. The online service is secure and supplied names lists are automatically discarded after the categorisation is complete.
The Ethnicity Estimator (EE) classifier is based on research which uses names data assembled by the Consumer Data Research Centre (CDRC). The data are taken from consumer sources and from the Office for National Statistics (ONS), which securely host data from England & Wales.
The research enables estimates of the ethnic distribution from datasets which contain names, using the best methodology. Users can now apply to access the Ethnicity Estimator software online. This software provides aggregate classifications reporting on estimated population for each of the standard ONS ethnicity groups.
Accepted applications will be for users who utilise the software for the public good, and applicants can be drawn from the academia, government or industry sectors. Please read our full Terms and Conditions (see document below) prior to making an application. Please note that the application review process takes a number of weeks. Once your application has been approved, then a new link on this page will be available to you when you are logged in.
The category groups are:
- ABD: Asian/Asian British - Bangladeshi
- ACN: Asian/Asian British - Chinese
- AIN: Asian/Asian British - Indian
- APK: Asian/Asian British - Pakistani
- AAO: Asian/Asian British - Any Other
- BAF: Black/Black British - African
- BCA: Black/Black British - Caribbean
- WBR: White - English/Welsh/Scottish/Northern Irish/British
- WBR: White - Irish
- WAO: White - Any Other (including Gypsy or Irish Traveller)
- OXX: Any Other Ethnic Group (including Arab)
- Unclassified: Names that could not be classified into one of the above.
A minimum of 100 distinct (unique) names must be supplied on your input file. The application's server will time-out if more than approximately 8000 names (including duplicate names) are supplied, so if your names list is longer than this, you will need to prepare multiple input files and run each one in turn.
Results Perturbation (Noise)
Due to a stipulation from one of the upstream data suppliers, the software adds some "noise" to the results, perturbating the count values by a small amount, mimicing the inherent uncertainty and inaccuracy in predicting an ethnicity solely from a name. This does mean that running the software repeatedly on the same set of names will produce slightly different numbers each time. A normal distribution is applied to the size of the perturbation, for each name. The Coefficient of Variation (CV) of the "noise" perturbation diminishes for larger datasets. Only rarely will the perturbation significantly change the result.
Here, two names lists - a small one and a large one, are each run 5 times, and the average and standard deviation is calculated. For low count results (<10), which are masked with an asterisk, a result of 3 is assumed for the SUM, but no result is assumed for the average and SD calculation. The unclassified count is not subject to perturbation.
152 names | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Average | SD |
---|---|---|---|---|---|---|---|
WBR | 47.7 | 51.6 | 49.6 | 49.4 | 49.6 | 49.58 | 1.4 |
WIR | * | * | * | * | * | ||
WAO | 27.3 | 25.2 | 26.3 | 25.9 | 28.2 | 26.58 | 1.2 |
AIN | 12.8 | 10.6 | 12.9 | 12.4 | * | 12.18 | 1.1 |
APK | * | * | * | * | * | ||
ABD | * | * | * | * | * | ||
ACN | 28.4 | 28.2 | 24.1 | 26 | 27.8 | 26.9 | 1.8 |
AAO | * | * | * | * | * | ||
BAF | * | * | * | * | * | ||
BCA | * | * | * | * | * | ||
OXX | * | 12.7 | 12.6 | 12.6 | 10.5 | 12.1 | 1.1 |
unclassified | 11 | 11 | 11 | 11 | 11 | 11 | 0 |
SUM | 148.2 | 157.3 | 154.5 | 155.3 | 148.1 | 152.68 | 4.3 |
7999 names | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Average | SD |
---|---|---|---|---|---|---|---|
WBR | 4668.3 | 4717.3 | 4702.7 | 4702.8 | 4699.8 | 4698.18 | 18.0 |
WIR | 227.5 | 245.6 | 218.7 | 225.3 | 225 | 228.42 | 10.1 |
WAO | 853.2 | 815.1 | 829.8 | 805.2 | 821.9 | 825.04 | 18.2 |
AIN | 817.9 | 823.7 | 818.4 | 824.6 | 822.8 | 821.48 | 3.1 |
APK | 125.8 | 125.7 | 125.2 | 126 | 138.7 | 128.28 | 5.8 |
ABD | 17.1 | 25.5 | 31.8 | 30.2 | 27.1 | 26.34 | 5.7 |
ACN | 347.6 | 341.9 | 350.7 | 360.8 | 354.1 | 351.02 | 7.1 |
AAO | 58.3 | 67.2 | 64 | 63.9 | 73.7 | 65.42 | 5.6 |
BAF | 95.2 | 98.4 | 105.1 | 102.4 | 114.1 | 103.04 | 7.2 |
BCA | 114.5 | 128 | 118.6 | 119.4 | 104.4 | 116.98 | 8.6 |
OXX | 528.1 | 553.5 | 547.7 | 553.9 | 526 | 541.84 | 13.7 |
unclassified | 104 | 104 | 104 | 104 | 104 | 104 | 0 |
SUM | 7957.5 | 8045.9 | 8016.7 | 8018.5 | 8011.6 | 8010.04 | 32.3 |
Field | Value |
---|---|
Source | ONS |
Data and Resources
- Sample Output Filecsv
A sample CSV output file from the Ethnicity Estimator software.
Preview Download
Field | Value |
---|---|
Modified | 2024-11-20 |
Release Date | 2019-11-18 |
Spatial / Geographical Coverage Location | England and Wales |
Granularity | Ethnic Group |
Author | |
Contact Name | Oliver O'Brien |
Contact Email |