GitHub Repository | Duration: 2 hours | Difficulty: Beginner to Advanced

Overview

This hands-on workshop walks you through building a complete real-time data pipeline using Confluent Cloud. You’ll stream live cryptocurrency price data from the CoinGecko API, process it with Apache Flink SQL, and materialize it as Apache Iceberg tables using Tableflow — all queryable through DuckDB.

Pipeline Overview

What You’ll Learn

  • Set up and configure a Kafka cluster on Confluent Cloud

  • Create topics and ingest live data using HTTP Source Connector

  • Materialize Kafka topics as Iceberg tables with Tableflow

  • Write Flink SQL for real-time stream processing

  • Query real-time and historical data with DuckDB

Technologies

Apache Kafka

Data streaming and topic management

Apache Flink

Real-time stream processing via Flink SQL

Tableflow

Materializing Kafka topics as Apache Iceberg tables

DuckDB

Lightweight analytics on Iceberg tables

Schema Registry

AVRO schema validation

CoinGecko API

Live cryptocurrency price data source

Workshop Modules

Module 1: Setting Up Confluent Cloud

15 minutes

Validate prerequisite tools, authenticate with Confluent Cloud, create a Kafka cluster, generate API keys, and validate Tableflow access.

Module 2: Kafka Hands-On with CoinGecko Data

30 minutes

Create Kafka topics, set up an HTTP Source Connector for live cryptocurrency data, produce and consume real-time price events.

Kafka Data Ingestion

Module 3: Tableflow & Iceberg Setup

25 minutes

Materialize the crypto-prices topic as an Iceberg table via Tableflow, connect DuckDB to the Iceberg REST Catalog, and run real-time analytics queries.

Tableflow and Iceberg

45 minutes

The core module. Create a Flink compute pool, transform nested cryptocurrency data, write SQL queries for real-time analysis, and create derived tables:

  • price-alerts — threshold-based price alerts

  • crypto-trends — rolling trend analysis

  • crypto-predictions — pattern-based predictions

Flink Stream Processing
-- Example: Real-time price alerts
SELECT symbol, price, `timestamp`
FROM crypto_prices_exploded
WHERE price > 50000 AND symbol = 'bitcoin';

Module 5: Teardown

15 minutes

Critical cleanup guide to prevent accidental charges. Stops Flink compute pools, deletes connectors, removes API keys, and optionally tears down the cluster.

Getting Started

The workshop supports multiple environments:

  • GitHub Codespaces — one-click setup, everything pre-configured

  • VS Code Dev Containers — local Docker-based environment

  • Local setup — bring your own tools (Confluent CLI, DuckDB, jq)

git clone https://github.com/gAmUssA/cc-workshop.git
cd cc-workshop

See the prerequisites in the repo README.