Kafka

Flink SQL Enrichment Strategies on Confluent Cloud (and the AI Skill That Writes Them for You)

TLDR Rion Williams wrote the theory on four Flink enrichment strategies. I built the runnable SQL for Confluent Cloud, where a few things work differently than open-source Flink (no PROCTIME(), no JDBC lookup joins). External enrichment uses a regular LEFT JOIN against a compacted Kafka topic. Gradual enrichment uses an event-time temporal join that gives you version-correct customer data per order. Both run as pure Flink SQL on Confluent Cloud. I published a Flink SQL skill on the Tessl registry that generates these queries for you. Pair it with mcp-confluent and your AI assistant can write the SQL, create the topics, and submit the Flink statements without you leaving the editor. ...

Building a Streaming Lakehouse with Open Source: Kafka to Iceberg to Trino to Superset

TLDR I built an open-source streaming lakehouse: Kafka ingests events, Flink processes them, Iceberg stores them as tables, Trino queries them, and Superset visualizes them. One make demo command runs it all locally. Even when your data lands in Iceberg automatically (as it does with Confluent Tableflow), you still need a query engine and a visualization layer. This stack builds that full picture with open-source components. MinIO went closed-source, so I switched to SeaweedFS (thanks to Robin Moffatt’s research). And Flink’s dependency management is still a jar-shaped nightmare. ...

Hands-On with Confluent Cloud: Apache Kafka, Flink, and Tableflow

GitHub Repository | Duration: 2 hours | Difficulty: Beginner to Advanced Overview This hands-on workshop walks you through building a complete real-time data pipeline using Confluent Cloud. You’ll stream live cryptocurrency price data from the CoinGecko API, process it with Apache Flink SQL, and materialize it as Apache Iceberg tables using Tableflow — all queryable through DuckDB. What You’ll Learn Set up and configure a Kafka cluster on Confluent Cloud Create topics and ingest live data using HTTP Source Connector Materialize Kafka topics as Iceberg tables with Tableflow Write Flink SQL for real-time stream processing Query real-time and historical data with DuckDB ...

Goodbye Confluent

TL;DR May 6th, 2021, was my last day at Confluent. Here is an email, that I sent to my former co-workers @ConfluentInc. Dear team, I’d like to let you know that I am leaving my position of Developer Advocate on May 6th. I have really enjoyed my time here in Confluent, and I appreciate having had the opportunity to work with with many of you. It was quite a ride, but I would do it again without any doubts. ...

The Ultimate Oracle Code One 2019 Guide for Kafka and Stream Processing wisdom seekers

TL;DR Oracle Code One 2019 is upon us! Read this post to find all sessions where you can learn things Apache Kafka® and stream processing! Also, I will be listing very subjective personal recommendations. Don’t hesitate to reach-out if you would like me to add any details! Table 1. Revisions history Version Date Comments v1.0 9/16/2019 Initial revision Monday Apache Kafka Versus Integration Middleware (MQ, ETL, ESB): Friends or Enemies? [DEV1187] by Kai Waehner 09:00 AM - 09:45 AM | Moscone South - Room 302 Getting Started with Kafka [DEV2417] Nikhil Nanivadekar 12:30 PM - 01:15 PM | Moscone South - Room 207/208 Building Event-Driven Applications with Oracle’s Fn Project and Apache Kafka [DEV1917] 01:30 PM - 02:15 PM | Moscone South - Room 304 Building Reactive Pipelines: How to Go from Scalable Apps to Scalable Systems [DEV1256] by Mark Heckler 12:30 PM - 01:15 PM | Moscone South - Room 207/208 Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka, and KSQL [DEV1185] 04:00 PM - 04:45 PM | Moscone South - Room 201 Query and Analyze Kafka Streams with Oracle SQL [DEV4292] 05:00 PM - 05:45 PM | Moscone South - Room 204 ...

Stream Processing Like You Have Never Seen Before

TL;DR This is playbook for «Stream Processing like you have never seen before» talk Full source code is available Table 1. Revisions history Version Date Comments v1.1 11/05/2019 Updated version, presented at NYC Cloud Native meetup v1.0 09/05/2019 Initial revision, presented at DC Spring Meetup Spring Kafka Application Getting started Go to https://start.spring.io and generate project using «Spring for Apache Kafka», «Spring for Apache Kafka Streams», «Lombok», «Cloud Streams» ...

Where in the world is Viktor in second half of 2019

TL;DR This blog post inspired by similar posts of my colleagues' @rmoff [1] and @riferrei [2]. You can find all my speaking appearances until the end of 2019. Table 1. Revisions history Version Date Comments v1.1 10/03/2019 Added New York events v1.0 9/05/2019 Initial revision September 🇺🇸 5th of September: Washington, DC — DC Spring Framework I will be talking about integration between Kafka and SpringFramework. And I will try to live code! ...

Who is tweeting about hashtag KSQL?

TL;DR Another day, another post. This time it’s another playbook for my http://DataSciCon.tech talk «Who’s tweeting about #datascicon» on November 30th 2018 [1] A full source code published in confluetninc/demo-scene repository [2] Table 1. Revisions history Version Date Comments v1.1 12/02/2018 Small fixes in codes, screenshots, images v1.0 11/28/2018 Initial revision Prerequisites Docker Docker Compose Get example from GitHub If you will follow steps below you should checkout only directory that has source code relevant to this post. mkdir ~/temp/demo-scene cd ~/temp/demo-scene git init . git remote add origin -f https://github.com/confluentinc/demo-scene/ git config core.sparsecheckout true echo "twitter-streams/*" >> .git/info/sparse-checkout git pull --depth=2 origin master cd twitter-streams ls -lh ...

Streaming Movies Ratings with Kafka Streams and KSQL

TL;DR The sole purpose of this blog post is to draft a playbook for my presentation «Crossing the streams: Rethinking Stream processing with Kafka Streams and KSQL» [1] that I recently did on Kafka Summit 2018 in San Francisco. A full source code published in confluetninc/demo-scene repository [2] Table 1. Revisions history Version Date Comments v1.2 01/17/2019 use CP 5.1.0, updated Control Center screenshots v1.1 11/21/2018 Fixed links and minor grammar v1.0 11/20/2018 Initial revision Disclaimer: Another goal is to exercise some ideas around the visual representation of posts in this blog. And third and the last goal is to brush up my technical writing skills! Since I moved to DevX [3] from Professional Services where I did write a truckload of customer engagement reports. ...

Divide, Distribute and Conquer — Stream v. Batch @ Philly JUG

TL;DR On September 13th 2017 I presented «Divide, Distribute and Conquer: Stream v. Batch» at Philly JUG. In this presentation I talked about how developers and data engineers are changing their perception on data processing using streaming data technologies. Table 1. Revisions history Version Date Comments v1.0 09/20/2017 Initial revision Tweets @ThePhillyJUG meetup where @gamussa talked about stream vs batch, #kafka #kafkastreams pic.twitter.com/6KlFwfnCAw — Jason Young (@jythejavaguy) September 14, 2017 @gamussa: "how many here use Gradle? Maven? Hmm, people have made some poor life choices. (J/K!)" Didn't say which was the right choice :) pic.twitter.com/rPAGZNKEoy — Jason Young (@jythejavaguy) September 14, 2017 ...