Reference CSV schema, data requirements, and sample audit reports.
Your uploaded file must meet the following requirements:
candidate_id, sex, race_ethnicitymodel_score (0–1) or model_pred (0/1) is required; if both exist, model_score will be usedsex or race_ethnicity must be < 20 %{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "BiasBeacon CSV Row Schema",
"description": "Each row represents one candidate's model evaluation record. The file must contain a header row and one or more candidate rows.",
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"properties": {
"candidate_id": {
"type": "string",
"description": "Unique anonymized candidate identifier (must not include PII)."
},
"sex": {
"type": "string",
"description": "Candidate sex (M/F or equivalent coding). Leave empty if unknown."
},
"race_ethnicity": {
"type": "string",
"description": "Candidate race or ethnicity. Leave empty if unknown. Consistent labeling is required."
},
"model_score": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Predicted probability or continuous model score between 0 and 1. Optional if model_pred is provided."
},
"model_pred": {
"type": "integer",
"enum": [
0,
1
],
"description": "Binary model output (0 = not selected, 1 = selected). Optional if model_score is provided."
}
},
"required": [
"candidate_id",
"sex",
"race_ethnicity"
],
"anyOf": [
{
"required": [
"model_score"
]
},
{
"required": [
"model_pred"
]
}
],
"additionalProperties": false
}
}You can use either model_score or model_pred columns.
model_scorecandidate_id,sex,race_ethnicity,model_score 1001,M,white,0.87 1002,F,black,0.43 1003,F,,0.66 1004,M,hispanic,0.65
model_predcandidate_id,sex,race_ethnicity,model_pred 2001,F,white,1 2002,M,,0 2003,F,asian,1 2004,,hispanic,1
If your CSV doesn’t fully meet quality requirements, BiasBeacon will still generate the audit but include warning banners in the final PDF report. These indicate reduced statistical confidence or representativeness.
The dataset contains fewer than 1,000 rows. Bias estimates may be unstable or unreliable with such limited data.
More than 20% of rows are missing at least one protected attribute (sex or race_ethnicity), which reduces the accuracy of group comparisons.
The dataset covers less than 3 months of hiring data and includes fewer than 10,000 rows. Temporal coverage may be too short for stable, representative metrics.
The dataset covers less than 6 months and includes fewer than 5,000 rows. The sample may not fully capture historical or seasonal hiring patterns.
These warnings appear automatically in your audit PDF if the upload violates any of the thresholds. They do not block report generation, but they signal reduced audit reliability.
View a sample NYC LL 144 bias audit report generated from a compliant dataset:
Having trouble viewing? Download the PDF.