Skip to main content
Skip to main content

BigLake Metastore

Beta feature. Learn more.

ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your Iceberg tables in BigLake Metastore via ClickHouse.

Note

As this feature is beta, you will need to enable it using: SET allow_database_iceberg = 1;

Prerequisites

Before creating a connection from ClickHouse to BigLake Metastore, ensure you have:

  • A Google Cloud project with BigLake Metastore enabled
  • Application Default credentials (Oauth client ID and client secret) for an application, created via Google Cloud Console
  • A refresh token obtained by completing the OAuth flow with the appropriate scopes (e.g. https://www.googleapis.com/auth/bigquery and storage scope for GCS)
  • A warehouse path: a GCS bucket (and optional prefix) where your tables are stored, e.g. gs://your-bucket or gs://your-bucket/prefix

Creating a connection between BigLake Metastore and ClickHouse

With the OAuth credentials in place, create a database in ClickHouse that uses the DataLakeCatalog database engine:

SET allow_database_iceberg = 1;

CREATE DATABASE biglake_metastore
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS
    catalog_type = 'biglake',
    google_adc_client_id = '<client-id>',
    google_adc_client_secret = '<client-secret>',
    google_adc_refresh_token = '<refresh-token>',
    google_adc_quota_project_id = '<gcp-project-id>',
    warehouse = 'gs://<bucket_name>/<optional-prefix>';

Querying BigLake Metastore tables using ClickHouse

Once the connection is created, you can query tables registered in the BigLake Metastore.

USE biglake_metastore;

SHOW TABLES;

Example output:

┌─name─────────────────────┐
│icebench.my_iceberg_table │   
└──────────────────────────┘
SELECT count(*) FROM `icebench.my_iceberg_table`;
Backticks required

Backticks are required because ClickHouse doesn't support more than one namespace.

To inspect the table definition:

SHOW CREATE TABLE `icebench.my_iceberg_table`;

Loading data from BigLake into ClickHouse

To load data from a BigLake Metastore table into a local ClickHouse table for faster repeated queries, create a MergeTree table and insert from the catalog:

CREATE TABLE clickhouse_table
(
    `id` Int64,
    `event_time` DateTime64(3),
    `user_id` String,
    `payload` String
)
ENGINE = MergeTree
ORDER BY (event_time, id);

INSERT INTO local_events
SELECT * FROM biglake_metastore.`icebench.my_iceberg_table`;

After the initial load, query clickhouse_table for lower latency. Re-run the INSERT INTO ... SELECT to refresh data from BigLake when needed.