Stream DaQ Documentation#
Stream DaQ is a free and open-source Python library that makes data quality monitoring for streaming data as simple as a few lines of code. Monitor your data streams in real time, get instant alerts when quality issues arise, and keep your data pipelines running smoothly.
Understand what Stream DaQ is all about
Get up and running in less than 5 minutes
Explore real-world examples and use cases
Learn how data quality works for streaming data
Complete API documentation
Help make Stream DaQ a better tool
Installation#
pip install streamdaq
Requirements: Python >= 3.11
TL;DR#
# pip install streamdaq
from streamdaq import StreamDaQ, DaQMeasures as dqm, Windows
# Step 1: Configure your monitoring setup
daq = StreamDaQ().configure(
window=Windows.tumbling(3),
instance="user_id",
time_column="timestamp",
wait_for_late=1,
time_format='%Y-%m-%d %H:%M:%S'
)
# Step 2: Define what Data Quality means for you
daq.add(dqm.count('interaction_events'), assess="(5, 15]", name="count") \
.add(dqm.max('interaction_events'), assess=">5.09", name="max_interact") \
.add(dqm.most_frequent('interaction_events'), assess=check_most_frequent_items, name="freq_interact")
# Step 3: Start monitoring and let Stream DaQ do the work
daq.watch_out()
Key Features#
Get instant alerts when your data quality drops below your defined thresholds
Choose from 30+ built-in quality measures or create your own in plain Python
Support for tumbling, sliding, and session-based windows to fit your use case
Built specifically to address the challenges of unbounded streams
If you can write Python, you can monitor your data streams with Stream DaQ
Check results flow as a stream themselves, ready for further processing or alerting
Perfect for
Data Engineers building robust, end-to-end streaming pipelines
Data Scientists ensuring model input quality
MLOps Engineers monitoring production data flows
Analytics Teams maintaining dashboard reliability
Data Enthousiasts exploring the state-of-the-art in data quality
Next Steps#
Ready to dive in? Here are some suggested paths:
New to Stream DaQ? → Start with 👋 Our Manifesto
Starving for action? → Jump straight to the ⚡ 5-Minute Quickstart
Eager to deepen understanding? → Read 📚 Concepts
Looking for examples? → Check out 💡 Examples
Need detailed configuration? → Browse User Guide Overview
Support & Community#
We are a small, dedicated team committed to making Stream DaQ the best it can be. Stream DaQ is and will always be free and open-source. We really appreciate your support in making this project better. Here are some ways you can help:
🐛 Report bugs: GitHub Issues
💬 Ask questions: GitHub Discussions
⭐ Star the project: GitHub Repository
- 📧 Contact the team:
papster at csd.auth.gr - Vassilis, primary maintainer 👷♂️
gounaria at the same domain - Anastasios, project supervisor 🦸
Acknowledgments#
Stream DaQ is developed by the Data Engineering (DELAB) Team of Datalab AUTh, under the supervision of Prof. Anastasios Gounaris. Special thanks to Maria Kavouridou for giving birth to Quacklity, the Stream DaQ mascot!