Clickhouse OG1

Meet the ClickHouse Dataset Provider for Cribl Search

Last edited: February 20, 2025

If you’re working with large datasets, you know the pain of sluggish queries and delayed insights. Traditional database solutions can struggle under the weight of massive data volumes, limiting your ability to act on information in real time and act on time-sensitive information.

The ClickHouse Dataset Provider for Cribl Search directly addresses this pain point. By integrating ClickHouse, a high-performance columnar database designed for large-scale analytics, with Cribl Search's powerful querying capabilities, we've created a solution that:

  1. Accelerates query performance on massive datasets

  2. Enables real-time analytics on high-frequency data

  3. Facilitates deep exploration of historical data without sacrificing speed

  4. Reduces infrastructure costs by eliminating the need to move data before analysis

With the ClickHouse integration in Cribl Search, users now have a powerful way to tackle even the most data-intensive challenges, from monitoring streaming data to digging deep into historical records.

  • Real-time analytics - Get instant access to high-frequency data, so you can act faster and stay ahead. Whether it’s monitoring web logs or transaction records, ClickHouse enables lightning-fast queries right in Cribl Search for real-time decision-making.

  • Operational visibility - Continuously monitor large-scale operational datasets with ease. Cribl Search empowers you to query and analyze data from multiple sources in place, leveraging its federated search model to access data directly where it resides—whether that’s in ClickHouse, cloud storage, or data lakes. This approach eliminates the need to move data first, saving both time and infrastructure costs. 

When combined with Cribl Stream, you can route, shape, and transform this data in real time as you bring it into Cribl Search for operational insights. Stream, Search, and ClickHouse come together to help you quickly detect anomalies, optimize resource usage, and ensure that critical data flows are uninterrupted.

  • Historical data exploration - Dive deep into extensive historical datasets to uncover long-term trends and insights. With ClickHouse’s ability to handle complex queries at scale, you can analyze years of data with Cribl Search and extract actionable intelligence without bottlenecks.

Getting Started with ClickHouse Dataset Provider

Step 1: Connect to ClickHouse

To start searching ClickHouse databases from Cribl Search, you'll need to:

  1. Set up a ClickHouse Dataset Provider in Cribl Search, which stores the access credentials and connection details to your ClickHouse server or cloud service.

  2. Set up a ClickHouse Dataset, which defines the specific table, view, or query within the ClickHouse database that Cribl Search will query.

Each dataset provider can have multiple datasets assigned to it.

Add a ClickHouse Dataset Provider

Before adding a ClickHouse Dataset Provider, ensure that:

  • Your ClickHouse cloud service has the HTTP interface enabled, preferably over HTTPS on port 8443.

You have your ClickHouse URL, username, and password ready.

Clickhouse image 1

Steps to Add a ClickHouse Dataset Provider:

  1. Log in to Cribl Search as an Admin or Editor Search Member.

  2. Go to Data > Dataset Provider, then click Add Provider.

  3. In the ID field, enter a unique identifier (e.g., my_clickhouse_database). This ID will be used to reference the dataset provider when adding datasets.

  4. In Description, provide a brief explanation of the dataset provider for easier identification.

  5. Set the Dataset Provider Type to ClickHouse.

  6. Enter the following details:

    • Username: Your ClickHouse username for authentication.

    • Password: Your ClickHouse password for authentication.

    • Endpoint: The URL of your ClickHouse server or cloud service.

  7. Click Save to complete the setup.

Clickhouse 1

Add a ClickHouse Dataset

Once you’ve set up a ClickHouse Dataset Provider, you can create specific datasets within Cribl Search to run queries on.

Steps to Add a ClickHouse Dataset:

  1. Go to Data > Datasets, then click Add Dataset.

  2. In the ID field, enter a unique identifier (e.g., clickhouse_dataset_ID). This is the ID used to reference the dataset (e.g., dataset="clickhouse_dataset_ID").

  3. In Description, describe the dataset for easier identification.

  4. In Select a Provider, choose the ClickHouse Dataset Provider you configured earlier.

  5. Fill in the following fields:

    • Database name: Case-sensitive name of the ClickHouse database (leave empty if using the default database).

    • Table name, view, or query: Enter the name of the table, view, or SQL query (e.g., logs or SELECT * FROM logs). Be aware that queries or tables without sorting keys will limit results to 100,000 rows.

    • Timestamp field (optional): The name of the column that holds the timestamp for time-based queries.

Click Save to finalize the dataset configuration.

Clickhouse 3

How to Query ClickHouse with Cribl Search  

Now that we have both the Clickhouse dataset provider and the dataset ready, let’s run a few searches. For these examples, we will be using an internal synthetic dataset (clickhouse_orderitem) to demonstrate the capabilities of Cribl Search with ClickHouse.

Basic Query Example

Retrieve the first 1,000 rows from the dataset: dataset="clickhouse_orderitem" | limit 1000

Clickhouse 4

Filter and Aggregate Example

Calculate total sales revenue grouped by product_category and sort in descending order: dataset="clickhouse_orderitem" | summarize TotalRevenue=sum(price * quantity) by product_category | sort by TotalRevenue desc

Clickhouse 5

Category Popularity Example

Find the top 5 most popular product categories based on the number of items sold:

dataset="clickhouse_orderitem" 
| summarize ItemsSold=sum(quantity) by product_category
| sort by ItemsSold desc
| limit 5

Clickhouse 6

Wrapping Up

With the ClickHouse Dataset Provider, Cribl Search is breaking new ground in data accessibility and analytics. This integration makes it easier than ever to query and analyze petabyte-scale datasets, whether you’re looking for real-time insights or diving into historical trends. Ready to be one of the first to experience it? Set up your ClickHouse integration today to explore how Cribl can help transform your data management strategy.

Don’t miss out on this powerful new capability—start using Cribl.Cloud free today or schedule a personalized demo to see Cribl Search and ClickHouse in action!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

More from the blog

get started

Choose how to get started

See

Cribl

See demos by use case, by yourself or with one of our team.

Try

Cribl

Get hands-on with a Sandbox or guided Cloud Trial.

Free

Cribl

Process up to 1TB/day, no license required.