Configuring a Reconciliation Job
This is the complete reference for configuring a reconciliation job. It covers every option you can set, what it does, and how the options interact. For the behaviour behind these options — how types are inferred, how filters compare, how tolerances are evaluated — see Data Types, Filtering & Comparison.
Job shape at a glance
Section titled “Job shape at a glance”A job ties together two data sources and tells DataRecs what to compare and how strict to be. The top-level fields are:
| Field | Required | Purpose |
|---|---|---|
name | Yes | Human-readable job name. Appears in notifications and run history. |
workspace_id | Yes | The workspace the job belongs to. |
dimensions | Yes | Columns to group by (the GROUP BY keys). |
measures | Yes | Column + aggregation pairs to compare. At least one required. |
connections | Yes | The two data sources, with their source location, column mapping, and filters. |
stages | Yes | One or more comparison stages, each with its own dimensions and tolerances. |
stage_execution_strategy | No | How multiple stages run. Defaults to DRILL_DOWN_ON_FAILURE. |
max_stage_concurrency | No | Caps parallelism when the strategy is PARALLEL. |
notification_emails | No | Addresses that receive the result email. |
email_detail_level | No | How much detail the result email includes. |
max_email_rows | No | Caps the number of result rows included in detailed emails. |
{ "name": "daily-revenue-by-region", "workspace_id": "4c60b399-37c6-4e3b-9337-7ab4d7692790", "dimensions": ["Region"], "measures": [ { "name": "Amount", "aggregation": "SUM" } ], "connections": [ /* see Connections */ ], "stages": [ { "dimensions": ["Region"], "tolerances": [ { "measure_name": "Amount", "type": "ABSOLUTE", "value": 0.01 } ] } ], "stage_execution_strategy": "DRILL_DOWN_ON_FAILURE", "notification_emails": ["data-team@example.com"], "email_detail_level": "FULL_DATA"}Dimensions
Section titled “Dimensions”dimensions are the columns DataRecs groups by before comparing — the equivalent of SQL GROUP BY. Measures are computed within each group, and groups are matched across the two sources by their dimension values.
"dimensions": ["Region", "Category"]- Use the canonical column names here (the keys you define in each connection’s
column_mapping), not the raw source column names. - Rows with an empty dimension value are grouped together under a single “empty” group, consistently on both sources.
- An empty
dimensionsarray compares a single grand-total group (every row folded together).
Measures
Section titled “Measures”measures define what to compare. Each measure pairs a name (a canonical column, or any label for COUNT_STAR) with an aggregation.
"measures": [ { "name": "Amount", "aggregation": "SUM" }, { "name": "OrderId", "aggregation": "COUNT_DISTINCT" }, { "name": "rowcount", "aggregation": "COUNT_STAR" }]Aggregation types
Section titled “Aggregation types”| Aggregation | Computes | Column requirement |
|---|---|---|
SUM | Total of the values | Numeric |
AVG | Mean of the values | Numeric |
MIN | Smallest value | Numeric, text, or date |
MAX | Largest value | Numeric, text, or date |
COUNT | Count of non-empty values | Any |
COUNT_DISTINCT | Count of distinct values | Any |
COUNT_STAR | Count of rows (no column needed) | — |
VARIANCE | Statistical variance | Numeric |
STD_DEV | Standard deviation | Numeric |
Connections
Section titled “Connections”connections is an array of exactly two data sources. Each entry locates the data and maps its columns onto your canonical names.
"connections": [ { "connection_id": "4e809504-9cfa-4c94-b66d-2ce9bd53ea08", "source_database": "Quarterly Sheet (1AbC...xyz)", "source_schema": "Q1_Sales", "source_table": "Q1_Sales", "column_mapping": { "Region": "region_name", "Amount": "net_amount" }, "filters": [], "alias": "Google Sheet" }, { "connection_id": "75479276-1431-4fac-a179-06ccd8fbafc6", "source_database": "Finance Workbook (01BPDZ...4E7UXF)", "source_schema": "Q1_Sales", "source_table": "Q1_Sales", "column_mapping": { "Region": "Region", "Amount": "Amount" }, "filters": [], "alias": "Excel/SharePoint" }]| Field | Required | Purpose |
|---|---|---|
connection_id | Yes | The configured connection to extract through. |
source_schema | Yes | The schema (database sources) or tab/worksheet (spreadsheets) to read. |
source_table | Yes | The table, or the named table/range within a spreadsheet tab. |
source_database | Sometimes | The database, spreadsheet, or workbook identifier. Required for spreadsheet sources (see below). |
column_mapping | Yes | Maps your canonical column names to the actual column names in this source. |
filters | No | Per-source filter conditions (see Filters). |
alias | No | A display label for this source in results and emails. |
Column mapping
Section titled “Column mapping”column_mapping is how DataRecs reconciles sources whose column names differ. The keys are your canonical names (used everywhere else in the job — dimensions, measures, filters); the values are the real column names in that specific source.
"column_mapping": { "Region": "region_name", "Amount": "net_amount" }If both sources use the same column names, the mapping is an identity map ({ "Region": "Region" }). The mapping must include every dimension and every measure name; it may additionally include any column you filter on.
source_database for spreadsheets
Section titled “source_database for spreadsheets”For database sources, source_database is optional. For spreadsheet sources it is required and encodes the spreadsheet/workbook identifier:
- Google Sheets — the spreadsheet ID, conventionally written as a display label with the ID in trailing parentheses:
"My Sheet (1AbC...xyz)". DataRecs reads the ID from the parentheses. - Excel/SharePoint — the workbook’s drive-item ID in trailing parentheses:
"My Workbook (01BPDZ...4E7UXF)". The SharePoint site is taken from the connection configuration.
Filters
Section titled “Filters”filters restrict which rows each source contributes. They are applied per source, before aggregation, and multiple filters are combined with AND. Apply the same filters to both connections so each side compares the same logical subset.
"filters": [ { "column": "Status", "operator": "=", "values": ["Active"] }, { "column": "Amount", "operator": ">", "values": ["100"] }]| Field | Required | Purpose |
|---|---|---|
column | Yes | The canonical column to filter on. |
operator | Yes | One of the operators below. |
values | Sometimes | The comparison value(s). Omitted for IS NULL / IS NOT NULL. |
Operators
Section titled “Operators”| Operator | values | Example |
|---|---|---|
= | one | { "column": "Status", "operator": "=", "values": ["Active"] } |
!= | one | { "column": "Status", "operator": "!=", "values": ["Excluded"] } |
> < >= <= | one | { "column": "Amount", "operator": ">=", "values": ["100"] } |
IN | one or more | { "column": "Region", "operator": "IN", "values": ["US", "UK"] } |
NOT IN | one or more | { "column": "Region", "operator": "NOT IN", "values": ["EU", "APAC"] } |
LIKE | one | { "column": "Name", "operator": "LIKE", "values": ["A%"] } |
IS NULL | none | { "column": "Notes", "operator": "IS NULL" } |
IS NOT NULL | none | { "column": "Notes", "operator": "IS NOT NULL" } |
Behaviour highlights (full detail in Data Types, Filtering & Comparison):
- Range operators (
>,<,>=,<=) compare numerically on numeric columns and lexicographically on text columns.>=and<=include the boundary value. IS NULL/IS NOT NULLtreat an empty string""the same asNULL.- A filter that excludes every row yields a MATCHED result with zero rows — not an error.
valuesmay contain{{parameter}}placeholders resolved at run time.
Stages and tolerances
Section titled “Stages and tolerances”stages is an array of one or more comparison passes. Each stage has its own dimensions (grouping) and tolerances (per-measure strictness). A single-stage job is the common case; multiple stages enable drill-down.
"stages": [ { "dimensions": ["Region"], "tolerances": [ { "measure_name": "Amount", "type": "ABSOLUTE", "value": 0.01 }, { "measure_name": "Revenue", "type": "PERCENT", "value": 5 } ] }]Tolerances
Section titled “Tolerances”A tolerance says how far a measure may differ between sources before the group is a mismatch.
| Field | Purpose |
|---|---|
measure_name | The measure this tolerance applies to. Must match a defined measure. |
type | ABSOLUTE or PERCENT. |
value | The threshold. |
ABSOLUTE— the raw difference may be at mostvalue. A group fails whendelta > value.PERCENT— the percentage difference (relative to the smaller source value) may be at mostvalue. A group fails whenpercentage_difference > value.- No tolerance, or
value: 0— any non-zero difference is a mismatch (exact match required).
With multiple measures, a group is MATCHED only if every measure passes. One breach fails the group. A zero-versus-non-zero difference under a PERCENT tolerance always fails (its percentage difference is treated as 100%).
Stage execution strategies
Section titled “Stage execution strategies”stage_execution_strategy controls how multiple stages run. For a single-stage job all strategies behave identically.
| Strategy | Behaviour |
|---|---|
DRILL_DOWN_ON_FAILURE (default) | Stages run from coarsest to finest. A stage only runs for the groups that failed the previous stage. If a coarse stage matches, the finer stages are skipped. Efficient for large reconciliations. |
SEQUENTIAL | Every stage runs in order, independently, regardless of pass/fail. No drill-down filtering. |
PARALLEL | All stages run concurrently and independently. Use max_stage_concurrency to cap how many run at once. |
Drill-down dimension rule
Section titled “Drill-down dimension rule”When using DRILL_DOWN_ON_FAILURE, each stage must include all dimensions of the previous stage (it can only get finer, never change keys). For example:
"stages": [ { "dimensions": ["Region"], "tolerances": [...] }, { "dimensions": ["Region", "Category"], "tolerances": [...] }, { "dimensions": ["Region", "Category", "Product"], "tolerances": [...] }]Stage 0 compares per Region. Any Region that fails is drilled into per Region + Category in stage 1, and failing Region + Category groups are drilled into per Product in stage 2. SEQUENTIAL and PARALLEL stages have no such restriction and may use independent dimension sets.
max_stage_concurrency only applies to PARALLEL and caps how many stages execute at once.
Notifications
Section titled “Notifications”| Field | Purpose |
|---|---|
notification_emails | Addresses that receive the result email when a run finalises. Validated against your tenant’s recipient rules. |
email_detail_level | How much detail the email includes (e.g. summary only vs. full result data). A per-job override can only lower the workspace/tenant level, never raise it. |
max_email_rows | Caps the number of result rows in detailed emails. Capped by the tenant’s max_email_rows. |
Exactly one notification email is sent per finalised run. A cancelled run does not send a result email.
Validation rules
Section titled “Validation rules”DataRecs rejects a job at creation time if any of these are violated:
measuresmust be non-empty.stagesmust be non-empty.- Every dimension and measure name must appear as a key in every connection’s
column_mapping. Extra mapping keys are allowed only if they are filter columns. - Every tolerance
measure_namemust reference a defined measure. - Under
DRILL_DOWN_ON_FAILURE, each stage must include all dimensions of the previous stage. - A filter must supply
valuesunless its operator isIS NULLorIS NOT NULL.
Next steps
Section titled “Next steps”- Data Types, Filtering & Comparison — the behaviour behind these options.
- Running a Reconciliation — creating and triggering a job.
- Reconciliation Run Lifecycle — run states and events.