BigQuery MCP Server¶
Production-ready BigQuery MCP server for secure, controlled access to Google BigQuery datasets with comprehensive authentication, access controls, and advanced querying capabilities.
This template provides a robust interface to Google BigQuery, enabling AI assistants and automation tools to query datasets, explore schemas, and analyze data with enterprise-grade security features.
Overview¶
The BigQuery MCP Server offers a comprehensive interface to Google BigQuery, providing secure access to datasets with configurable authentication methods and fine-grained access controls. It's designed for production environments where data security and access governance are paramount.
Key Features¶
- 🔐 Multiple Authentication Methods: Service Account, OAuth2, Application Default Credentials
- 🛡️ Security First: Read-only mode by default with explicit warnings for write operations
- 🎯 Advanced Access Control: Dataset filtering via patterns, regex, and wildcards
- 📊 Schema Discovery: Comprehensive table and dataset exploration capabilities
- ⚡ Query Execution: SQL query execution with safety limits, dry-run validation, and monitoring
- 🔍 Job Management: BigQuery job status tracking and monitoring
- 🚀 Enterprise Ready: Docker support, health checks, and production deployment patterns
Architecture¶
The BigQuery MCP server follows a secure, modular architecture:
- Authentication Layer: Supports multiple Google Cloud authentication methods
- Access Control Engine: Configurable dataset filtering and permission management
- Query Engine: SQL execution with safety controls and result limiting
- Schema Browser: Dataset and table exploration with metadata retrieval
- Job Monitor: BigQuery job status tracking and management
- Transport Layer: Supports HTTP and stdio protocols for flexible integration
Quick Start¶
Application Default Credentials (Development)¶
Perfect for local development and testing:
# Set up Google Cloud credentials
gcloud auth application-default login
# Deploy the BigQuery MCP server
python -m mcp_platform deploy bigquery \
--config project_id=my-gcp-project
Service Account (Production)¶
Recommended for production environments:
# Deploy with service account authentication
python -m mcp_platform deploy bigquery \
--config project_id=my-gcp-project \
--config auth_method=service_account \
--config service_account_path=/path/to/service-account.json
With Security Controls¶
Deploy with dataset restrictions and read-only mode:
python -m mcp_platform deploy bigquery \
--config project_id=my-gcp-project \
--config allowed_datasets="analytics_*,public_data" \
--config read_only=true \
--config max_results=500
Usage & API Reference¶
For comprehensive usage examples, tool documentation, and integration guides:
The usage guide includes: - Available Tools - Complete list of tools with parameters and examples - Integration Examples - Python, JavaScript, and CLI usage - HTTP API - REST endpoint documentation - Configuration - Setup and deployment options - Best Practices - Tips for optimal usage
Usage & API Reference¶
For comprehensive usage examples, tool documentation, and integration guides:
The usage guide includes: - Available Tools - Complete list of tools with parameters and examples - Integration Examples - Python, JavaScript, and CLI usage - HTTP API - REST endpoint documentation - Configuration - Setup and deployment options - Best Practices - Tips for optimal usage
Configuration¶
Environment Variables¶
Variable | Description | Default | Required |
---|---|---|---|
GOOGLE_CLOUD_PROJECT |
Google Cloud project ID | - | Yes |
BIGQUERY_AUTH_METHOD |
Authentication method (service_account , oauth2 , application_default ) |
application_default |
No |
GOOGLE_APPLICATION_CREDENTIALS |
Service account key file path | - | If using service account |
BIGQUERY_READ_ONLY |
Enable read-only mode (blocks data modifications) | true |
No |
BIGQUERY_ALLOWED_DATASETS |
Dataset access patterns (comma-separated) | * |
No |
BIGQUERY_DATASET_REGEX |
Advanced regex filter for datasets | - | No |
BIGQUERY_QUERY_TIMEOUT |
Maximum query execution time (seconds) | 300 |
No |
BIGQUERY_MAX_RESULTS |
Maximum rows returned per query | 1000 |
No |
MCP_LOG_LEVEL |
Logging level (debug , info , warning , error ) |
info |
No |
Authentication Methods¶
1. Application Default Credentials (Recommended for Development)¶
Uses your local Google Cloud SDK credentials:
# Set up ADC
gcloud auth application-default login
# Verify setup
gcloud auth application-default print-access-token
2. Service Account (Recommended for Production)¶
Create and configure a service account for production use:
# Create service account
gcloud iam service-accounts create bigquery-mcp \
--description="BigQuery MCP Server" \
--display-name="BigQuery MCP"
# Grant BigQuery permissions
gcloud projects add-iam-policy-binding MY_PROJECT \
--member="serviceAccount:bigquery-mcp@MY_PROJECT.iam.gserviceaccount.com" \
--role="roles/bigquery.user"
# For job creation (optional)
gcloud projects add-iam-policy-binding MY_PROJECT \
--member="serviceAccount:bigquery-mcp@MY_PROJECT.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
# Download service account key
gcloud iam service-accounts keys create service-account.json \
--iam-account=bigquery-mcp@MY_PROJECT.iam.gserviceaccount.com
3. OAuth2 (Interactive)¶
For interactive user-based authentication:
Access Control Configuration¶
Pattern-Based Dataset Filtering¶
Control dataset access using glob patterns:
# Allow specific datasets
--config allowed_datasets="analytics_prod,reporting_*,public_data"
# Allow all analytics datasets
--config allowed_datasets="analytics_*"
# Allow multiple patterns
--config allowed_datasets="prod_*,staging_analytics,public_*"
# Deny all datasets (for testing)
--config allowed_datasets=""
Regex-Based Filtering¶
For advanced filtering requirements:
# Only production datasets
--config dataset_regex="^(prod|production)_.*$"
# Exclude sensitive datasets
--config dataset_regex="^(?!.*(?:sensitive|private|internal)).*$"
# Allow specific prefixes
--config dataset_regex="^(analytics|reporting|public)_.*$"
Security Configuration¶
The server operates in read-only mode by default, blocking all data modification operations:
# Explicitly enable read-only mode (default)
--config read_only=true
# ⚠️ WARNING: Enable write mode (requires explicit acknowledgment)
--config read_only=false
Blocked Operations in Read-Only Mode: - INSERT, UPDATE, DELETE statements - CREATE, DROP, ALTER operations - TRUNCATE, MERGE operations - Any DDL or DML statements
list_datasets¶
List all accessible BigQuery datasets in the project, filtered by access controls.
Parameters: None
Returns: Array of datasets with metadata including ID, friendly name, location, and creation time.
list_tables¶
List tables in a specific dataset with metadata.
Parameters:
- dataset_id
(string, required): Dataset ID to list tables from
Returns: Array of tables with schema information and statistics.
Example:
describe_table¶
Get detailed schema information for a specific table.
Parameters:
- dataset_id
(string, required): Dataset ID containing the table
- table_id
(string, required): Table ID to describe
Returns: Complete table schema with column types, descriptions, and constraints.
Example:
mcpt interactive
mcpt> call bigquery describe_table '{
"dataset_id": "analytics_prod",
"table_id": "user_events" }'
execute_query¶
Execute SQL queries against BigQuery with safety controls and result limiting.
Parameters:
- query
(string, required): SQL query to execute
- dry_run
(boolean, optional): Validate query without executing (default: false)
Returns: Query results with metadata, including row count and execution statistics.
Example:
# Execute query
mcpt interactive
mcpt> call bigquery execute_query '{
"query": "SELECT COUNT(*) as total_events FROM `my-project.analytics.events` WHERE DATE(timestamp) = CURRENT_DATE()" }'
# Dry run validation
mcpt interactive
mcpt> call bigquery execute_query '{
"query": "SELECT user_id, COUNT(*) FROM `my-project.analytics.events` GROUP BY user_id",
"dry_run": true }'
get_job_status¶
Monitor BigQuery job status and execution progress.
Parameters:
- job_id
(string, required): BigQuery job ID to check
Returns: Job status, progress, and error information if applicable.
Example:
get_dataset_info¶
Get comprehensive information about a dataset.
Parameters:
- dataset_id
(string, required): Dataset ID to get information for
Returns: Dataset metadata including location, description, access time, and configuration.
Example:
Data Exploration¶
# List all accessible datasets
mcpt interactive
mcpt> call bigquery list_datasets
# Explore a specific dataset
mcpt interactive
mcpt> call bigquery list_tables '{"dataset_id": "analytics"}'
# Get table schema
mcpt interactive
mcpt> call bigquery describe_table '{
"dataset_id": "analytics",
"table_id": "user_events" }'
Analytics Queries¶
# Daily active users
mcpt interactive
mcpt> call bigquery execute_query '{
"query": "SELECT DATE(timestamp) as date, COUNT(DISTINCT user_id) as dau FROM `my-project.analytics.events` WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) GROUP BY DATE(timestamp) ORDER BY date" }'
# Top events by volume
mcpt interactive
mcpt> call bigquery execute_query '{
"query": "SELECT event_name, COUNT(*) as event_count FROM `my-project.analytics.events` WHERE DATE(timestamp) = CURRENT_DATE() GROUP BY event_name ORDER BY event_count DESC LIMIT 10" }'
Query Validation¶
# Validate complex query before execution
mcpt interactive
mcpt> call bigquery execute_query '{
"query": "WITH user_metrics AS (SELECT user_id, COUNT(*) as session_count, AVG(session_duration) as avg_duration FROM `my-project.analytics.sessions` GROUP BY user_id) SELECT * FROM user_metrics WHERE session_count > 10",
"dry_run": true }'
Transport Modes¶
HTTP Transport (Default)¶
Perfect for web applications and API integration:
# Start HTTP server
python -m mcp_platform deploy bigquery \
--config project_id=my-project \
--config mcp_transport=http \
--config mcp_port=7090
# Access via HTTP
curl http://localhost:8000/health
Stdio Transport¶
Ideal for Claude Desktop and direct MCP clients:
# Run in stdio mode
python server.py --project-id my-project --transport stdio
# Docker stdio mode
docker run -i --rm \
-e GOOGLE_CLOUD_PROJECT=my-project \
-e GOOGLE_APPLICATION_CREDENTIALS=/creds/service-account.json \
-v /path/to/service-account.json:/creds/service-account.json:ro \
dataeverything/mcp-bigquery:latest
Claude Desktop¶
Add to your Claude Desktop configuration:
{
"mcpServers": {
"bigquery": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "GOOGLE_CLOUD_PROJECT=my-project",
"-e", "GOOGLE_APPLICATION_CREDENTIALS=/creds/service-account.json",
"-v", "/path/to/service-account.json:/creds/service-account.json:ro",
"dataeverything/mcp-bigquery:latest"
]
}
}
}
Python Client¶
from fastmcp.client import FastMCPClient
import asyncio
async def bigquery_analytics():
# Connect to BigQuery MCP server
client = FastMCPClient(endpoint='http://localhost:8000')
# List available datasets
datasets = await client.call('list_datasets')
print(f"Found {len(datasets['datasets'])} datasets")
# Explore analytics dataset
tables = await client.call('list_tables', dataset_id='analytics')
print(f"Analytics dataset has {len(tables['tables'])} tables")
# Get daily active users
dau_query = """
SELECT
DATE(timestamp) as date,
COUNT(DISTINCT user_id) as daily_active_users
FROM `my-project.analytics.events`
WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
GROUP BY DATE(timestamp)
ORDER BY date
"""
result = await client.call('execute_query', query=dau_query)
print(f"DAU analysis returned {result['num_rows']} days of data")
# Analyze table schema
schema = await client.call('describe_table',
dataset_id='analytics',
table_id='events')
print(f"Events table has {len(schema['schema'])} columns")
await client.close()
# Run the analytics
asyncio.run(bigquery_analytics())
JavaScript/Node.js Client¶
const axios = require('axios');
class BigQueryMCPClient {
constructor(endpoint = 'http://localhost:8000') {
this.endpoint = endpoint;
}
async call(method, params = {}) {
const response = await axios.post(`${this.endpoint}/call`, {
method,
params
});
return response.data;
}
async getDatasetOverview(datasetId) {
const [tables, info] = await Promise.all([
this.call('list_tables', { dataset_id: datasetId }),
this.call('get_dataset_info', { dataset_id: datasetId })
]);
return {
info,
tables: tables.tables,
tableCount: tables.tables.length
};
}
async runAnalyticsQuery(query) {
// Validate first
await this.call('execute_query', { query, dry_run: true });
// Execute
return await this.call('execute_query', { query });
}
}
// Usage example
async function analyzeUserBehavior() {
const client = new BigQueryMCPClient();
const query = `
SELECT
event_name,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users
FROM \`my-project.analytics.events\`
WHERE DATE(timestamp) = CURRENT_DATE()
GROUP BY event_name
ORDER BY event_count DESC
LIMIT 10
`;
const result = await client.runAnalyticsQuery(query);
console.log('Top events today:', result.rows);
}
Docker Deployment¶
Using Pre-built Image¶
# Basic deployment
docker run -p 7090:7090 \
-e GOOGLE_CLOUD_PROJECT=my-project \
-e GOOGLE_APPLICATION_CREDENTIALS=/creds/service-account.json \
-v /path/to/service-account.json:/creds/service-account.json:ro \
dataeverything/mcp-bigquery:latest
# With security controls
docker run -p 7090:7090 \
-e GOOGLE_CLOUD_PROJECT=my-project \
-e BIGQUERY_READ_ONLY=true \
-e BIGQUERY_ALLOWED_DATASETS="analytics_*,public_*" \
-e BIGQUERY_MAX_RESULTS=500 \
-e GOOGLE_APPLICATION_CREDENTIALS=/creds/service-account.json \
-v /path/to/service-account.json:/creds/service-account.json:ro \
dataeverything/mcp-bigquery:latest
Docker Compose¶
version: '3.8'
services:
bigquery-mcp:
image: dataeverything/mcp-bigquery:latest
ports:
- "7090:7090"
environment:
GOOGLE_CLOUD_PROJECT: my-project
BIGQUERY_AUTH_METHOD: service_account
GOOGLE_APPLICATION_CREDENTIALS: /creds/service-account.json
BIGQUERY_READ_ONLY: true
BIGQUERY_ALLOWED_DATASETS: "analytics_*,public_data"
BIGQUERY_QUERY_TIMEOUT: 300
BIGQUERY_MAX_RESULTS: 1000
MCP_LOG_LEVEL: info
volumes:
- ./service-account.json:/creds/service-account.json:ro
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
Kubernetes Deployment¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: bigquery-mcp
labels:
app: bigquery-mcp
spec:
replicas: 2
selector:
matchLabels:
app: bigquery-mcp
template:
metadata:
labels:
app: bigquery-mcp
spec:
containers:
- name: bigquery-mcp
image: dataeverything/mcp-bigquery:latest
ports:
- containerPort: 7090
env:
- name: GOOGLE_CLOUD_PROJECT
value: "my-project"
- name: BIGQUERY_AUTH_METHOD
value: "service_account"
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/creds/service-account.json"
- name: BIGQUERY_READ_ONLY
value: "true"
- name: BIGQUERY_ALLOWED_DATASETS
value: "analytics_*,public_*"
volumeMounts:
- name: service-account
mountPath: /creds
readOnly: true
livenessProbe:
httpGet:
path: /health
port: 7090
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 7090
initialDelaySeconds: 10
periodSeconds: 10
volumes:
- name: service-account
secret:
secretName: bigquery-service-account
---
apiVersion: v1
kind: Service
metadata:
name: bigquery-mcp-service
spec:
selector:
app: bigquery-mcp
ports:
- port: 7090
targetPort: 7090
type: ClusterIP
Security Best Practices¶
1. Use Read-Only Mode¶
Always operate in read-only mode unless write operations are explicitly required:
# Default and recommended
--config read_only=true
# Only enable writes when absolutely necessary
--config read_only=false # ⚠️ Use with caution
2. Implement Dataset Access Controls¶
Restrict access to only necessary datasets:
# Specific datasets only
--config allowed_datasets="prod_analytics,public_reference"
# Pattern-based restrictions
--config allowed_datasets="prod_*,staging_analytics"
# Advanced regex filtering
--config dataset_regex="^(prod|production)_.*$"
3. Use Service Accounts in Production¶
Configure service accounts with minimal required permissions:
# Required IAM roles for read operations
gcloud projects add-iam-policy-binding MY_PROJECT \
--member="serviceAccount:bigquery-mcp@MY_PROJECT.iam.gserviceaccount.com" \
--role="roles/bigquery.user"
# For job creation and monitoring
gcloud projects add-iam-policy-binding MY_PROJECT \
--member="serviceAccount:bigquery-mcp@MY_PROJECT.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
# Optional: For specific dataset access
gcloud projects add-iam-policy-binding MY_PROJECT \
--member="serviceAccount:bigquery-mcp@MY_PROJECT.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
4. Configure Query Limits¶
Set appropriate limits to prevent resource abuse:
# Limit query execution time
--config query_timeout=300 # 5 minutes
# Limit result size
--config max_results=1000 # Maximum 1000 rows
# Combined security configuration
--config query_timeout=180 --config max_results=500 --config read_only=true
5. Enable Monitoring and Logging¶
Configure comprehensive logging for security monitoring:
# Enable detailed logging
--config log_level=info
# For debugging (contains sensitive information)
--config log_level=debug # Use only during troubleshooting
6. Network Security¶
Implement network-level security controls:
# Bind to localhost only (for local deployment)
docker run -p 127.0.0.1:7090:7090 dataeverything/mcp-bigquery
# Use reverse proxy with authentication
# nginx, Traefik, or cloud load balancer with auth
Troubleshooting¶
Authentication Issues¶
Could not automatically determine credentials¶
# Solution 1: Set up Application Default Credentials
gcloud auth application-default login
# Solution 2: Verify service account file
ls -la /path/to/service-account.json
cat /path/to/service-account.json | jq .type # Should return "service_account"
# Solution 3: Check environment variables
echo $GOOGLE_APPLICATION_CREDENTIALS
echo $GOOGLE_CLOUD_PROJECT
Permission denied errors¶
# Check current authentication
gcloud auth list
# Verify project access
gcloud projects describe $GOOGLE_CLOUD_PROJECT
# Test BigQuery access
bq ls --project_id=$GOOGLE_CLOUD_PROJECT
# Grant required permissions
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT \
--member="user:your-email@domain.com" \
--role="roles/bigquery.user"
Dataset Access Issues¶
Access to dataset 'dataset_name' is not allowed¶
# Check current configuration
curl http://localhost:8000/health
# Update allowed datasets
--config allowed_datasets="dataset_name,other_dataset"
# Use wildcard patterns
--config allowed_datasets="analytics_*,dataset_name"
# Check dataset exists and you have access
bq ls --project_id=$GOOGLE_CLOUD_PROJECT dataset_name
Query Execution Issues¶
Query timeout exceeded¶
# Increase timeout
--config query_timeout=600 # 10 minutes
# Optimize query (add LIMIT clause)
SELECT * FROM large_table LIMIT 1000
# Use dry_run to validate without executing
mcpt> call bigquery execute_query '{"query": "...", "dry_run": true}' --dry-run
Result size exceeded maximum¶
# Increase max_results
--config max_results=5000
# Add LIMIT to query
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1000
# Use pagination in application logic
Container Issues¶
Container won't start¶
# Check Docker logs
docker logs container_id
# Verify environment variables
docker run --rm dataeverything/mcp-bigquery env | grep BIGQUERY
# Test with minimal configuration
docker run --rm -e GOOGLE_CLOUD_PROJECT=test dataeverything/mcp-bigquery
Health check failures¶
# Check health endpoint
curl -v http://localhost:8000/health
# Verify port binding
netstat -tulpn | grep 7090
# Check container status
docker ps
docker inspect container_id
Debug Mode¶
Enable detailed logging for troubleshooting:
Performance Optimization¶
Query Performance¶
- Use appropriate LIMIT clauses to restrict result sizes
- Leverage BigQuery partitioning in your queries
- Use dry_run to validate complex queries before execution
- Monitor query costs with BigQuery console
Connection Management¶
- Use persistent connections with keep-alive settings
- Implement connection pooling in client applications
- Configure appropriate timeouts based on query complexity
Memory Management¶
- Set reasonable max_results limits to control memory usage
- Monitor container memory in production deployments
- Use streaming results for large datasets when possible
Health Monitoring¶
The server provides comprehensive health check endpoints:
# Basic health check
curl http://localhost:8000/health
# Detailed status (includes authentication and configuration)
curl http://localhost:8000/health | jq .
Health Check Response:
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00Z",
"bigquery_connection": "connected",
"authentication_method": "service_account",
"project_id": "my-project",
"read_only_mode": true,
"allowed_datasets": ["analytics_*", "public_*"],
"server_info": {
"version": "1.0.0",
"transport": "http",
"port": 7090
}
}
Contributing¶
This template follows the MCP Platform contribution guidelines:
- Code Quality: Follow PEP 8 and use type hints
- Testing: Add comprehensive tests for new features
- Documentation: Update both README.md and this user guide
- Security: Consider security implications of all changes
See CONTRIBUTING.md for detailed guidelines.
Support¶
- Template Issues: MCP Platform Repository
- Feature Requests: GitHub Discussions
- Documentation: MCP Platform Docs
- Community: Discord Server
License¶
This template is part of the MCP Platform and is licensed under the Elastic License 2.0. See LICENSE for details.
Usage & API Reference¶
For comprehensive usage examples, tool documentation, and integration guides:
The usage guide includes: - Available Tools - Complete list of tools with parameters and examples - Integration Examples - Python, JavaScript, and CLI usage - HTTP API - REST endpoint documentation - Configuration - Setup and deployment options - Best Practices - Tips for optimal usage
Configuration Options¶
Property | Type | Environment Variable | Default | Description |
---|---|---|---|---|
project_id |
string | GOOGLE_CLOUD_PROJECT |
`` | Google Cloud project ID containing BigQuery datasets |
auth_method |
string | BIGQUERY_AUTH_METHOD |
application_default |
Method for authenticating with Google Cloud |
service_account_path |
string | GOOGLE_APPLICATION_CREDENTIALS |
`` | Path to service account JSON key file (required if auth_method is service_account) |
read_only |
boolean | BIGQUERY_READ_ONLY |
True |
Enable read-only mode to prevent data modifications (recommended for security) |
allowed_datasets |
string | BIGQUERY_ALLOWED_DATASETS |
* |
Comma-separated list of dataset patterns. Use * for all datasets, or specify patterns like 'analytics_*,public_data' |
dataset_regex |
string | BIGQUERY_DATASET_REGEX |
`` | Optional regex pattern for advanced dataset filtering. Takes precedence over allowed_datasets if provided |
query_timeout |
integer | BIGQUERY_QUERY_TIMEOUT |
300 |
Maximum time to wait for query execution |
max_results |
integer | BIGQUERY_MAX_RESULTS |
1000 |
Maximum number of rows to return from queries |
log_level |
string | MCP_LOG_LEVEL |
info |
Logging level for the server |