BigLake Metastore
ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your Iceberg tables in BigLake Metastore via ClickHouse.
As this feature is beta, you will need to enable it using:
SET allow_database_iceberg = 1;
Prerequisites
Before creating a connection from ClickHouse to BigLake Metastore, ensure you have:
- A Google Cloud project with BigLake Metastore enabled
- Application Default credentials (Oauth client ID and client secret) for an application, created via Google Cloud Console
- A refresh token obtained by completing the OAuth flow with the appropriate scopes (e.g.
https://www.googleapis.com/auth/bigqueryand storage scope for GCS) - A warehouse path: a GCS bucket (and optional prefix) where your tables are stored, e.g.
gs://your-bucketorgs://your-bucket/prefix
Creating a connection between BigLake Metastore and ClickHouse
With the OAuth credentials in place, create a database in ClickHouse that uses the DataLakeCatalog database engine:
Querying BigLake Metastore tables using ClickHouse
Once the connection is created, you can query tables registered in the BigLake Metastore.
Example output:
Backticks are required because ClickHouse doesn't support more than one namespace.
To inspect the table definition:
Loading data from BigLake into ClickHouse
To load data from a BigLake Metastore table into a local ClickHouse table for faster repeated queries, create a MergeTree table and insert from the catalog:
After the initial load, query clickhouse_table for lower latency. Re-run the INSERT INTO ... SELECT to refresh data from BigLake when needed.