Google BigQuery
Send tracked database events to Google BigQuery for analytics and data warehousing
Overview
Google BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse. pg_track_events supports sending data to BigQuery in two formats:
- Processed Events - Transformed analytics events with defined event names and properties
- Raw DB Events - Raw database change events showing the exact changes made to your database
Processed Events Destination
This destination sends your transformed analytics events to BigQuery for analysis and reporting.
Configuration
To configure BigQuery as a destination for your processed events, add the following to your pg_track_events.config.yaml
file:
Configuration Options
filter
: Event name glob filter (optional) Learn about filtering eventstableId
: Your BigQuery table ID in formatproject_id.dataset_id.table_name
(required)credentialsJson
: Your Google Cloud service account credentials JSON (required). Can be hardcoded directly in the YAML or reference an environment variable.
Table Structure
Your BigQuery table should have the following schema:
Raw DB Events Destination
This destination sends raw database change events to BigQuery, allowing you to see exactly what changed in your database.
Configuration
To configure BigQuery as a destination for your raw DB events, add the following to your pg_track_events.config.yaml
file:
Configuration Options
filter
: Event name glob filter (optional) Learn about filtering eventstableId
: Your BigQuery table ID in formatproject_id.dataset_id.table_name
(required)credentialsJson
: Your Google Cloud service account credentials JSON (required). Can be hardcoded directly in the YAML or reference an environment variable.
Table Structure
Your BigQuery table for raw DB events should have the following schema:
Setting Up Google BigQuery
Creating a Service Account
- Go to the Google Cloud Console
- Select your project
- Navigate to "IAM & Admin" > "Service Accounts"
- Click "Create Service Account"
- Give it a name and description
- Assign the "BigQuery Data Editor" role
- Create a JSON key and download it
- Store this key securely and use it for the
credentialsJson
configuration
Creating a Dataset and Table
- In the Google Cloud Console, navigate to BigQuery
- Create a new dataset in your preferred location
- Create a table with the schema defined above
- Note the project ID, dataset ID, and table name to form your
tableId
Locations and Regions
Google BigQuery is available in multiple regions worldwide. When creating your dataset, consider:
- Data residency requirements
- Colocating your BigQuery dataset with other Google Cloud resources
- Regional pricing differences
User ID (Distinct ID)
For processed events, the system automatically extracts a distinct ID from your event properties in the following order:
distinct_id
property- Common user ID fields (
user_id
,userid
,_user_id
) - For tables named
users
,user
, or_users
, it will look for anid
field
If no ID is found, the user ID will be null
. Note that while other integrations may refer to this ID as Distinct ID, in BigQuery, this identifier is stored in the user_id
field.
Note
After making configuration changes, restart the pg_track_events agent for them to take effect.