> ## Documentation Index
> Fetch the complete documentation index at: https://help.statisfy.com/llms.txt
> Use this file to discover all available pages before exploring further.

# BigQuery Integration

> Connect Google BigQuery to Statisfy to sync product usage, custom fields, and custom objects with self-serve query configuration.

### Connect BigQuery with Statisfy

The BigQuery integration lets you import data from your Google BigQuery warehouse into Statisfy. Author SQL queries in the Statisfy settings UI, preview results, and schedule them — sync product usage, account-level custom fields, person-level attributes, or custom objects on whatever cadence you need.

BigQuery uses the same self-serve configuration model as the other warehouse integrations. You can find the shared concepts — query types, account resolvers, transformers, schedules, configuration history — in [Integration Concepts](/integrations/self_serve_configuration). This page focuses on what's specific to BigQuery.

**Prerequisites:**

* **Admin access** to Statisfy
* A Google Cloud project with BigQuery enabled
* A **service account** with read access to the datasets and tables you want to sync
* A downloaded **service account JSON key**

### What You Can Import

| Data Type          | Description                           | Use Case                                     |
| ------------------ | ------------------------------------- | -------------------------------------------- |
| **Product Usage**  | Time-series usage metrics per account | MAU, API calls, feature usage                |
| **Custom Fields**  | Account- or person-level attributes   | Health scores, segments, industry, plan tier |
| **Custom Objects** | Domain-specific records               | Subscriptions, deployments, contracts        |
| **Contact Data**   | Person-level attribute values         | Activity dates, role, status                 |

### Set Up the Service Account

1. In the **Google Cloud Console**, open the project that hosts your BigQuery data.

2. Go to **IAM & Admin → Service Accounts** and create a new service account dedicated to Statisfy.

3. Grant the service account, at minimum:
   * `roles/bigquery.dataViewer` on the datasets / project you want Statisfy to read
   * `roles/bigquery.jobUser` on the project Statisfy will run jobs in (needed to execute queries)

4. Create a **JSON key** for the service account and download it. You'll upload this to Statisfy in the next step.

<Warning>
  Grant the service account only the datasets and tables Statisfy needs. The credential is stored encrypted, but limiting the blast radius is good hygiene — rotate the key periodically.
</Warning>

### Steps to Connect

1. **Log in to Statisfy.**

2. Navigate to **Integrations → Admin Apps → BigQuery → Connect**.

3. Upload your **service account JSON key file**.

4. (Optional) Enter a **default project ID** — the project Statisfy will run jobs against if a query doesn't specify its own. Most tenants want this set to the project that contains both the credential and the data.

5. Click **Test Connection** to confirm Statisfy can authenticate and run a no-op query.

6. Once connected, BigQuery will appear as **Connected** on the Integrations page.

### Configure Queries

Open **Integrations → BigQuery → Settings** to:

* **Browse datasets and tables** the service account can see, with column types and previews
* **Author queries** using BigQuery Standard SQL with autocomplete and a preview pane
* **Map results** to Statisfy accounts (via Statisfy account ID, CRM ID, organization ID, email domain, account name, or any custom field) and to fields, metrics, or custom objects
* **Apply column transformers** (`strip_prefix`, `regex_extract`, `lookup`, etc.) to reshape values before they're written
* **Set the cadence** — daily, weekly, monthly, or a sub-day interval (HOURLY through TWELVE\_HOURS)
* **Run a one-off sync** from the same page

Each saved query is one of three types:

| Query Type      | Output                                    |
| --------------- | ----------------------------------------- |
| `product_usage` | Time-series metric values per account     |
| `custom_field`  | Account- or person-level attribute values |
| `custom_object` | Records of a custom object type           |

See [Query types](/integrations/self_serve_configuration#query-types) and [Account resolvers](/integrations/self_serve_configuration#account-resolvers) for the shared model.

### BigQuery-Specific Notes

* **Project qualification.** If your data lives in a different GCP project than the default, fully qualify table names in your query (`` `other-project.dataset.table` ``).
* **Date placeholders.** Use `{START_DATE}` / `{END_DATE_SQL}` to keep queries incremental. For `product_usage` snapshots, BigQuery's `DATE()` and `TIMESTAMP_TRUNC()` work well with these placeholders.
* **Cost control.** BigQuery bills per byte scanned. Partition filters (`WHERE _PARTITIONDATE BETWEEN ...`) and clustered-column filters dramatically reduce cost on large tables — push them into the saved query rather than relying on Statisfy to discard rows.
* **Streaming buffer.** Rows in the streaming buffer (`_PARTITIONTIME IS NULL`) may not appear in scheduled queries that filter on a partition column. If you depend on near-real-time data, either query unpartitioned tables or filter to allow `IS NULL`.

### Account Matching

| Resolver                | Description                              |
| ----------------------- | ---------------------------------------- |
| **Statisfy Account ID** | Native Statisfy account identifier       |
| **CRM Account ID**      | Salesforce or HubSpot account identifier |
| **Organization ID**     | External org identifier                  |
| **Email Domain**        | Company website domain                   |
| **Account Name**        | Company name (fuzzy matching)            |
| **Custom Field**        | Any custom field defined on accounts     |

<Warning>
  Rows that don't resolve to an existing Statisfy account are skipped. For CRM ID resolvers, make sure your CRM integration has synced the matching accounts first.
</Warning>

For `custom_field` queries that target people instead of accounts, configure a **person resolver** (email column, optionally name) — Statisfy will upsert the person record by email.

### Sync Behavior

**Scheduled Jobs:**

* Each saved query runs on its own cadence.
* The first run after connecting is a **bootstrap** that covers the last 30 days; subsequent runs cover only the active window for the cadence.
* Self-serve configuration takes precedence. If no self-serve config exists for a query type, Statisfy falls back to any built-in mapping set up by support.

**Error Handling:**

* Per-query errors are logged but do not block other queries from running.
* Invalid rows within a query are skipped and reported in the job audit log.

### Network Access

BigQuery is reachable over the public internet, so no IP allowlisting is required. The service account credential is the only authentication step.

### Security

* The service account JSON is stored encrypted in Google Cloud Secret Manager.
* Queries authored in the Settings page are read-only — Statisfy validates the query against the BigQuery Standard SQL dialect and rejects DDL/DML before save.
* Access is scoped to whatever the service account can see — grant `dataViewer` only on the datasets you want Statisfy to read.

### Troubleshooting

<Accordion title="Connection test fails">
  **Check the service account:**

  * Verify the JSON key file is the latest version (rotated keys invalidate older copies)
  * Confirm the service account has `bigquery.jobUser` on the default project (needed to execute queries)
  * Confirm the service account has `bigquery.dataViewer` on the dataset(s) you want to query
</Accordion>

<Accordion title="Schema or tables not visible in the browser">
  * Confirm the service account has `dataViewer` (or higher) on the dataset
  * Information schema views (`INFORMATION_SCHEMA.*`) are intentionally hidden from the picker
</Accordion>

<Accordion title="Query validation errors at save time">
  * Statisfy validates queries against BigQuery Standard SQL. Legacy SQL syntax (`#legacySQL`) is not supported — switch to Standard SQL.
  * DDL (`CREATE`, `DROP`) and DML (`INSERT`, `UPDATE`, `DELETE`, `MERGE`) are rejected — Statisfy only runs read-only queries.
  * Check the validation error for the offending function or syntax.
</Accordion>

<Accordion title="Records not matching to accounts">
  * Verify the resolver column contains values that exist in Statisfy
  * For CRM ID resolvers, confirm your CRM integration has synced the matching accounts
  * Try a different resolver (switch from account name to email domain) if matching is unreliable
  * Add a `regex_extract` or `strip_prefix` transformer if the column contains the right ID with extra characters
</Accordion>

<Accordion title="Cost is higher than expected">
  * BigQuery bills per byte scanned. Use the **Query Plan** preview to see scan size before saving.
  * Add partition filters (`_PARTITIONDATE`, `_PARTITIONTIME`) and use `{START_DATE}` / `{END_DATE_SQL}` to keep scans incremental.
  * Drop the cadence to `DAILY` so each run scans a smaller window.
</Accordion>

### Need Help?

For initial setup, query design, or troubleshooting, contact [support@statisfy.com](mailto:support@statisfy.com).
