Configuring a Reconciliation Job

This is the complete reference for configuring a reconciliation job. It covers every option you can set, what it does, and how the options interact. For the behaviour behind these options — how types are inferred, how filters compare, how tolerances are evaluated — see Data Types, Filtering & Comparison.

Job shape at a glance

A job ties together two data sources and tells DataRecs what to compare and how strict to be. The top-level fields are:

Field	Required	Purpose
`name`	Yes	Human-readable job name. Appears in notifications and run history.
`workspace_id`	Yes	The workspace the job belongs to.
`dimensions`	Yes	Columns to group by (the `GROUP BY` keys).
`measures`	Yes	Column + aggregation pairs to compare. At least one required.
`connections`	Yes	The two data sources, with their source location, column mapping, and filters.
`stages`	Yes	One or more comparison stages, each with its own dimensions and tolerances.
`stage_execution_strategy`	No	How multiple stages run. Defaults to `DRILL_DOWN_ON_FAILURE`.
`max_stage_concurrency`	No	Caps parallelism when the strategy is `PARALLEL`.
`notification_emails`	No	Addresses that receive the result email.
`email_detail_level`	No	How much detail the result email includes.
`max_email_rows`	No	Caps the number of result rows included in detailed emails.

{
  "name": "daily-revenue-by-region",
  "workspace_id": "4c60b399-37c6-4e3b-9337-7ab4d7692790",
  "dimensions": ["Region"],
  "measures": [
    { "name": "Amount", "aggregation": "SUM" }
  ],
  "connections": [ /* see Connections */ ],
  "stages": [
    {
      "dimensions": ["Region"],
      "tolerances": [
        { "measure_name": "Amount", "type": "ABSOLUTE", "value": 0.01 }
      ]
    }
  ],
  "stage_execution_strategy": "DRILL_DOWN_ON_FAILURE",
  "notification_emails": ["data-team@example.com"],
  "email_detail_level": "FULL_DATA"
}

Dimensions

dimensions are the columns DataRecs groups by before comparing — the equivalent of SQL GROUP BY. Measures are computed within each group, and groups are matched across the two sources by their dimension values.

"dimensions": ["Region", "Category"]

Use the canonical column names here (the keys you define in each connection’s column_mapping), not the raw source column names.
Rows with an empty dimension value are grouped together under a single “empty” group, consistently on both sources.
An empty dimensions array compares a single grand-total group (every row folded together).

Measures

measures define what to compare. Each measure pairs a name (a canonical column, or any label for COUNT_STAR) with an aggregation.

"measures": [
  { "name": "Amount",   "aggregation": "SUM" },
  { "name": "OrderId",  "aggregation": "COUNT_DISTINCT" },
  { "name": "rowcount", "aggregation": "COUNT_STAR" }
]

Aggregation types

Aggregation	Computes	Column requirement
`SUM`	Total of the values	Numeric
`AVG`	Mean of the values	Numeric
`MIN`	Smallest value	Numeric, text, or date
`MAX`	Largest value	Numeric, text, or date
`COUNT`	Count of non-empty values	Any
`COUNT_DISTINCT`	Count of distinct values	Any
`COUNT_STAR`	Count of rows (no column needed)	—
`VARIANCE`	Statistical variance	Numeric
`STD_DEV`	Standard deviation	Numeric

Connections

connections is an array of exactly two data sources. Each entry locates the data and maps its columns onto your canonical names.

"connections": [
  {
    "connection_id": "4e809504-9cfa-4c94-b66d-2ce9bd53ea08",
    "source_database": "Quarterly Sheet (1AbC...xyz)",
    "source_schema": "Q1_Sales",
    "source_table": "Q1_Sales",
    "column_mapping": { "Region": "region_name", "Amount": "net_amount" },
    "filters": [],
    "alias": "Google Sheet"
  },
  {
    "connection_id": "75479276-1431-4fac-a179-06ccd8fbafc6",
    "source_database": "Finance Workbook (01BPDZ...4E7UXF)",
    "source_schema": "Q1_Sales",
    "source_table": "Q1_Sales",
    "column_mapping": { "Region": "Region", "Amount": "Amount" },
    "filters": [],
    "alias": "Excel/SharePoint"
  }
]

Field	Required	Purpose
`connection_id`	Yes	The configured connection to extract through.
`source_schema`	Yes	The schema (database sources) or tab/worksheet (spreadsheets) to read.
`source_table`	Yes	The table, or the named table/range within a spreadsheet tab.
`source_database`	Sometimes	The database, spreadsheet, or workbook identifier. Required for spreadsheet sources (see below).
`column_mapping`	Yes	Maps your canonical column names to the actual column names in this source.
`filters`	No	Per-source filter conditions (see Filters).
`alias`	No	A display label for this source in results and emails.

Column mapping

column_mapping is how DataRecs reconciles sources whose column names differ. The keys are your canonical names (used everywhere else in the job — dimensions, measures, filters); the values are the real column names in that specific source.

"column_mapping": { "Region": "region_name", "Amount": "net_amount" }

If both sources use the same column names, the mapping is an identity map ({ "Region": "Region" }). The mapping must include every dimension and every measure name; it may additionally include any column you filter on.

`source_database` for spreadsheets

For database sources, source_database is optional. For spreadsheet sources it is required and encodes the spreadsheet/workbook identifier:

Google Sheets — the spreadsheet ID, conventionally written as a display label with the ID in trailing parentheses: "My Sheet (1AbC...xyz)". DataRecs reads the ID from the parentheses.
Excel/SharePoint — the workbook’s drive-item ID in trailing parentheses: "My Workbook (01BPDZ...4E7UXF)". The SharePoint site is taken from the connection configuration.

Filters

filters restrict which rows each source contributes. They are applied per source, before aggregation, and multiple filters are combined with AND. Apply the same filters to both connections so each side compares the same logical subset.

"filters": [
  { "column": "Status", "operator": "=",  "values": ["Active"] },
  { "column": "Amount", "operator": ">",  "values": ["100"] }
]

Field	Required	Purpose
`column`	Yes	The canonical column to filter on.
`operator`	Yes	One of the operators below.
`values`	Sometimes	The comparison value(s). Omitted for `IS NULL` / `IS NOT NULL`.

Operators

Operator	`values`	Example
`=`	one	`{ "column": "Status", "operator": "=", "values": ["Active"] }`
`!=`	one	`{ "column": "Status", "operator": "!=", "values": ["Excluded"] }`
`>` `<` `>=` `<=`	one	`{ "column": "Amount", "operator": ">=", "values": ["100"] }`
`IN`	one or more	`{ "column": "Region", "operator": "IN", "values": ["US", "UK"] }`
`NOT IN`	one or more	`{ "column": "Region", "operator": "NOT IN", "values": ["EU", "APAC"] }`
`LIKE`	one	`{ "column": "Name", "operator": "LIKE", "values": ["A%"] }`
`IS NULL`	none	`{ "column": "Notes", "operator": "IS NULL" }`
`IS NOT NULL`	none	`{ "column": "Notes", "operator": "IS NOT NULL" }`

Behaviour highlights (full detail in Data Types, Filtering & Comparison):

Range operators (>, <, >=, <=) compare numerically on numeric columns and lexicographically on text columns. >= and <= include the boundary value.
IS NULL / IS NOT NULL treat an empty string "" the same as NULL.
A filter that excludes every row yields a MATCHED result with zero rows — not an error.
values may contain {{parameter}} placeholders resolved at run time.

Stages and tolerances

stages is an array of one or more comparison passes. Each stage has its own dimensions (grouping) and tolerances (per-measure strictness). A single-stage job is the common case; multiple stages enable drill-down.

"stages": [
  {
    "dimensions": ["Region"],
    "tolerances": [
      { "measure_name": "Amount",  "type": "ABSOLUTE", "value": 0.01 },
      { "measure_name": "Revenue", "type": "PERCENT",  "value": 5 }
    ]
  }
]

Tolerances

A tolerance says how far a measure may differ between sources before the group is a mismatch.

Field	Purpose
`measure_name`	The measure this tolerance applies to. Must match a defined measure.
`type`	`ABSOLUTE` or `PERCENT`.
`value`	The threshold.

ABSOLUTE — the raw difference may be at most value. A group fails when delta > value.
PERCENT — the percentage difference (relative to the smaller source value) may be at most value. A group fails when percentage_difference > value.
No tolerance, or value: 0 — any non-zero difference is a mismatch (exact match required).

With multiple measures, a group is MATCHED only if every measure passes. One breach fails the group. A zero-versus-non-zero difference under a PERCENT tolerance always fails (its percentage difference is treated as 100%).

Stage execution strategies

stage_execution_strategy controls how multiple stages run. For a single-stage job all strategies behave identically.

Strategy	Behaviour
`DRILL_DOWN_ON_FAILURE` (default)	Stages run from coarsest to finest. A stage only runs for the groups that failed the previous stage. If a coarse stage matches, the finer stages are skipped. Efficient for large reconciliations.
`SEQUENTIAL`	Every stage runs in order, independently, regardless of pass/fail. No drill-down filtering.
`PARALLEL`	All stages run concurrently and independently. Use `max_stage_concurrency` to cap how many run at once.

Drill-down dimension rule

When using DRILL_DOWN_ON_FAILURE, each stage must include all dimensions of the previous stage (it can only get finer, never change keys). For example:

"stages": [
  { "dimensions": ["Region"], "tolerances": [...] },
  { "dimensions": ["Region", "Category"], "tolerances": [...] },
  { "dimensions": ["Region", "Category", "Product"], "tolerances": [...] }
]

Stage 0 compares per Region. Any Region that fails is drilled into per Region + Category in stage 1, and failing Region + Category groups are drilled into per Product in stage 2. SEQUENTIAL and PARALLEL stages have no such restriction and may use independent dimension sets.

max_stage_concurrency only applies to PARALLEL and caps how many stages execute at once.

Notifications

Field	Purpose
`notification_emails`	Addresses that receive the result email when a run finalises. Validated against your tenant’s recipient rules.
`email_detail_level`	How much detail the email includes (e.g. summary only vs. full result data). A per-job override can only lower the workspace/tenant level, never raise it.
`max_email_rows`	Caps the number of result rows in detailed emails. Capped by the tenant’s `max_email_rows`.

Exactly one notification email is sent per finalised run. A cancelled run does not send a result email.

Validation rules

DataRecs rejects a job at creation time if any of these are violated:

measures must be non-empty.
stages must be non-empty.
Every dimension and measure name must appear as a key in every connection’s column_mapping. Extra mapping keys are allowed only if they are filter columns.
Every tolerance measure_name must reference a defined measure.
Under DRILL_DOWN_ON_FAILURE, each stage must include all dimensions of the previous stage.
A filter must supply values unless its operator is IS NULL or IS NOT NULL.

Next steps

Data Types, Filtering & Comparison — the behaviour behind these options.
Running a Reconciliation — creating and triggering a job.
Reconciliation Run Lifecycle — run states and events.