β‘ 5-Minute Quickstart#
Letβs get Stream DaQ running with a complete example that monitors data quality in real-time. Youβll have a working monitoring setup in less than 5 minutes!
Step 1: Install Stream DaQ#
pip install streamdaq
Step 2: Create Your First Monitor#
Create a new Python file called my_first_monitor.py
and add this code:
from streamdaq import StreamDaQ, DaQMeasures as dqm, Windows
import pandas as pd
from datetime import datetime, timedelta
import time
# Sample streaming data (simulating real-time events)
def generate_sample_data():
"""Generate sample e-commerce events"""
events = []
base_time = datetime.now()
for i in range(50):
event = {
'user_id': f'user_{i % 10}', # 10 different users
'event_type': 'purchase' if i % 3 == 0 else 'view',
'amount': round(10 + (i * 1.5) % 100, 2),
'timestamp': base_time + timedelta(seconds=i * 2),
'session_id': f'session_{i // 5}' # 5 events per session
}
events.append(event)
return pd.DataFrame(events)
# Step 1: Set up your data quality monitor
daq = StreamDaQ().configure(
window=Windows.tumbling(10), # 10-second windows
instance="user_id", # Monitor per user
time_column="timestamp", # Use timestamp for windowing
wait_for_late=2, # Wait 2 seconds for late data
time_format=None # Auto-detect datetime format
)
# Step 2: Define what "good data quality" means for your use case
daq.add(dqm.count('event_type'), assess=">0", name="has_events") \
.add(dqm.distinct_count('event_type'), assess=">=1", name="event_variety") \
.add(dqm.max('amount'), assess="<=200", name="reasonable_amounts") \
.add(dqm.mean('amount'), assess="(10, 150)", name="avg_amount_range")
# Step 3: Start monitoring (this will process your data stream)
print("π Starting Stream DaQ monitoring...")
print("π Processing sample e-commerce events...")
# Simulate streaming data
sample_data = generate_sample_data()
results = daq.watch_out(sample_data)
print("β
Monitoring complete! Check the results above.")
Step 3: Run Your Monitor#
Run your Python script:
python my_first_monitor.py
You should see output similar to this:
π Starting Stream DaQ monitoring...
π Processing sample e-commerce events...
| user_id | window_start | window_end | has_events | event_variety | reasonable_amounts | avg_amount_range |
|---------|--------------------|--------------------|------------|---------------|-------------------|------------------|
| user_0 | 2024-01-15 10:00:00| 2024-01-15 10:00:10| (5, True) | (2, True) | (45.5, True) | (32.1, True) |
| user_1 | 2024-01-15 10:00:00| 2024-01-15 10:00:10| (3, True) | (1, True) | (89.2, True) | (65.4, True) |
β
Monitoring complete! Check the results above.
π Congratulations!#
You just:
β Monitored 4 different quality metrics across streaming data
β Got real-time results for each user and time window
β Received pass/fail assessments for each quality check
β Handled windowing and late data automatically
Understanding Your Results#
Each row represents quality metrics for one user in one time window:
(5, True)
Found 5 events in the window, passed the β>0β check β
(2, True)
Found 2 distinct event types, passed the β>=1β check β
(45.5, True)
Max amount was 45.5, passed the β<=200β check β
(32.1, True)
Average amount was 32.1, passed the β(10, 150)β range check β
What Just Happened?#
Data Streaming: Stream DaQ processed your data as if it were coming from a real-time stream
Windowing: Data was grouped into 10-second tumbling windows per user
Quality Assessment: Each window was checked against your 4 quality rules
Real-time Results: You got immediate feedback on data quality as a stream of results
Next Steps#
Now that youβve seen Stream DaQ in action:
π Learn the concepts: π Concepts - Understand windows, measures, and assessments
π Go deeper: π― First Monitoring - Build a monitoring setup step-by-step
π‘ See more examples: π‘ Examples - Explore real-world use cases
βοΈ Advanced config: User Guide Overview - Master all configuration options
Try This Next
Modify the assessment criteria in your code:
Change
assess=">0"
toassess=">3"
and see what happensTry
assess="==2"
for event varietyExperiment with different window sizes using
Windows.tumbling(5)