Clickhouse OG 2

ClickHouse and Cribl: A Powerful Data Ingestion and Analysis Duo

Last edited: February 11, 2025

In today's data-driven world, efficient data ingestion, storage, and analysis are critical for businesses to make informed decisions. ClickHouse is a fast, columnar database ideal for analytics, while Cribl Stream efficiently ingests and transforms data from various sources before delivering it to destinations like ClickHouse. ClickHouse and Cribl, when combined, offer a robust and scalable solution to meet these needs.

Benefits of Using ClickHouse and Cribl

Cribl Stream, as a powerful data ingestion and enrichment platform, can seamlessly integrate with various data sources, including logs, metrics, and trace event data. It can preprocess, filter, and transform events before sending them to ClickHouse, optimizing data storage and query performance.

  • Seamless Integration: Cribl Stream integrates with diverse data sources, including logs, metrics, and trace data, preprocessing and transforming events before sending them to ClickHouse.

  • Optimized Performance: Pre-filtered and enriched data ensures faster query performance in ClickHouse.

  • Scalability: Handle petabyte-scale datasets with ease, delivering real-time and historical insights for actionable decisions.

Step-by-Step Setup

Before getting started, ensure you have:

Getting Started with ClickHouse

After completing the signup process, create the default My First Service using your preferred Cloud provider.

ClickHouse Credentials

After the service is created, expand Connect your app and make note of the username, password, and URL to the new ClickHouse service. Feel free to reset your password if you didn't take note of it.

Clickhouse blog 1

Configuring ClickHouse

  • URL: https://your_instance.clickhouse.cloud:8443

  • Username: default

  • Password: Your_ClickHouse_Password

Adding Data in ClickHouse

The first thing you want to do is create an empty table that matches the schema of the events you will send from Cribl Stream. Expand the Add data section and select Create an empty table. If you were idle for too long, select the Wake service button to bring the service out of an idle state.

Clickhouse blog 2

Creating the Table in ClickHouse

Creating a ClickHouse Table

Select Create an empty table. We will use SQL to create the Cribl table with a schema that matches events being sent from Cribl Stream.

Creating a Table in ClickHouse Using SQL

The test event above consists of many keys with a type of string, with one exception, the Cribl _time field. This field contains millisecond precision to three decimal places, therefore we need to create a type of DateTime64(3) to store the event timestamps correctly. In SQL Console → Queries → New query, copy/paste the following SQL and select Run. Notice I am not including the cribl_test field as it isn't needed for this example.

Query Interface in ClickHouse

Clickhouse blog 3

Creating a Cribl Stream Clickhouse Destination

In Cribl Stream select Data → Destinations → Clickhouse → Add Destination and add the required fields below. If you need the information again, select the Connect option on the left side of your ClickHouse menu to bring it back up. If you need to Reset the password, feel free to do so.

  • Output ID: ClickHouse

  • URL: https://your_instance.clickhouse.cloud:8443

  • ClickHouse database: default

  • ClickHouse table: Cribl

  • Username: default

  • Password: Your_ClickHouse_Password

  • Select Save → Commit & Deploy

Wait a couple minutes for the Cribl deployment to finish.

Configuring the Cribl Destination

Clickhouse 5

Testing the Cribl Stream ClickHouse Destination

In Cribl Stream, open the ClickHouse Destination, select the Test tab at the top and select Run Test a few times. You should see Success at the bottom of the window.

Cribl Stream ClickHouse Destination

unnamed.png

Exploring the Events in ClickHouse

In ClickHouse select your My First Service → SQL Console to display the events sent from Cribl Stream. Congratulations, now let's show how to search these events in Cribl Search!

Events from Cribl Stream in ClickHouse

clickhouse 7

Now that all the plumbing is done, the next step is to set up the Data Provider and Dataset and search ClickHouse.

Creating a Cribl Search Data Provider

Select Cribl Search → Data → Data Providers → Add Provider → Create

Cribl Search Data Provider

click house search

Creating a Cribl Search Dataset

Select Cribl Search → Data → Datasets→ Add Dataset

  • ID: ClickHouse_Dataset

  • Database name: default

  • Table name, view, or query: Cribl

  • Timestamp field: _time

  • Save

Cribl Search Dataset

unnamed.png

Select Cribl Search and run the following query:

  • dataset="ClickHouse_Dataset" | limit 1000

Cribl Search of ClickHouse

search click house

Leveraging Cribl Stream Datagen

To effectively test ClickHouse, we'll leverage Cribl Stream's Datagen source to generate synthetic data streams, simulating real-world scenarios. In the default Cribl Stream Worker Group, select Data → Sources. In the filter on the top right, type Datagen to quickly locate this Source.

Cribl Stream Datagen Source

unnamed.png

Select Datagen → Add Source

  • Input ID: Datagen_Apache

  • Data Generator File: apache_common.log

  • Events Per Second Per Worker Node: 10

  • Select Save

Cribl Stream Datagen Configuration

unnamed.png

Routing Events to ClickHouse from Cribl Stream

The next step is to route events using Cribl Stream's Quick Connect to ClickHouse.

  • Select Routing → QuickConnect → Add Source → Datagen → Select Existing → Datagen_Apache and select Yes to switch to QuickConnect.

  • Select the + on the right side of the Datagen source and connect the line to ClickHouse. Select Passthru.

  • Select Save → Commit & Deploy

In a few moments the Datagen will create the synthetic Apache events and they will route to ClickHouse.

Cribl Stream QuickConnect

unnamed.png

Running a query: dataset="ClickHouse_Dataset" | count will display the count of events in ClickHouse.

Note: Remember to turn off the Cribl Stream Datagen after your testing is complete.

Cribl Search (Left) - ClickHouse (Right)

ch cribl

Summary

You now have a powerful, flexible, and cost-effective solution for ingesting and analyzing large-scale event data. With the ClickHouse Dataset Provider, Cribl Search is breaking new ground in data accessibility and analytics. This integration makes it easier than ever to query and analyze petabyte-scale datasets, whether you’re looking for real-time insights or diving into historical trends. Ready to be one of the first to experience it? Set up your ClickHouse integration today to explore how Cribl can help transform your data management strategy.

Don’t miss out on this powerful new capability—start using Cribl.Cloud free today or schedule a personalized demo to see Cribl Search and ClickHouse in action!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

More from the blog