Stream DaQ Documentation#

Stream DaQ is a free and open-source Python library that makes data quality monitoring for streaming data as simple as a few lines of code. Monitor your data streams in real time, get instant alerts when quality issues arise, and keep your data pipelines running smoothly.

👋 Our Manifesto

Understand what Stream DaQ is all about

👋 Our Manifesto
🚀 Quick Start

Get up and running in less than 5 minutes

⚡ 5-Minute Quickstart
💡 Examples

Explore real-world examples and use cases

💡 Examples
📚 Concepts

Learn how data quality works for streaming data

📚 Concepts
📖 API Reference

Complete API documentation

📖 API Reference
💌 Contributing

Help make Stream DaQ a better tool

💌 Contributing

Installation#

pip install streamdaq

Requirements: Python >= 3.11

TL;DR#

# pip install streamdaq

from streamdaq import StreamDaQ, DaQMeasures as dqm, Windows

# Step 1: Configure your monitoring setup
daq = StreamDaQ().configure(
    window=Windows.tumbling(3),
    instance="user_id",
    time_column="timestamp",
    wait_for_late=1,
    time_format='%Y-%m-%d %H:%M:%S'
)

# Step 2: Define what Data Quality means for you
daq.add(dqm.count('interaction_events'), assess="(5, 15]", name="count") \
   .add(dqm.max('interaction_events'), assess=">5.09", name="max_interact") \
   .add(dqm.most_frequent('interaction_events'), assess=check_most_frequent_items, name="freq_interact")

# Step 3: Start monitoring and let Stream DaQ do the work
daq.watch_out()

Key Features#

⚡ Real-time Monitoring

Get instant alerts when your data quality drops below your defined thresholds

🔧 Highly Configurable

Choose from 30+ built-in quality measures or create your own in plain Python

🪟 Flexible Windows

Support for tumbling, sliding, and session-based windows to fit your use case

🎯 Stream-Native

Built specifically to address the challenges of unbounded streams

🐍 Pure Python

If you can write Python, you can monitor your data streams with Stream DaQ

📊 Rich Output

Check results flow as a stream themselves, ready for further processing or alerting

Perfect for

  • Data Engineers building robust, end-to-end streaming pipelines

  • Data Scientists ensuring model input quality

  • MLOps Engineers monitoring production data flows

  • Analytics Teams maintaining dashboard reliability

  • Data Enthousiasts exploring the state-of-the-art in data quality

Next Steps#

Ready to dive in? Here are some suggested paths:

New to Stream DaQ? → Start with 👋 Our Manifesto

Starving for action? → Jump straight to the ⚡ 5-Minute Quickstart

Eager to deepen understanding? → Read 📚 Concepts

Looking for examples? → Check out 💡 Examples

Need detailed configuration? → Browse User Guide Overview

Support & Community#

We are a small, dedicated team committed to making Stream DaQ the best it can be. Stream DaQ is and will always be free and open-source. We really appreciate your support in making this project better. Here are some ways you can help:

  • 🐛 Report bugs: GitHub Issues

  • 💬 Ask questions: GitHub Discussions

  • Star the project: GitHub Repository

  • 📧 Contact the team:
    • papster at csd.auth.gr - Vassilis, primary maintainer 👷‍♂️

    • gounaria at the same domain - Anastasios, project supervisor 🦸

Acknowledgments#

Stream DaQ is developed by the Data Engineering (DELAB) Team of Datalab AUTh, under the supervision of Prof. Anastasios Gounaris. Special thanks to Maria Kavouridou for giving birth to Quacklity, the Stream DaQ mascot!