Firebase Analytics Import Pipeline
Description¶
Enable importing Firebase Analytics data into QualCoder for mixed-methods triangulation. Researchers collect behavioral data (Firebase events) alongside qualitative data (interview transcripts). This feature bridges the gap by:
- Aggregating raw Firebase event data into per-participant profiles
- Importing those profiles as Case attributes (engagement_tier, sessions_per_week, etc.)
- Linking cases to transcript sources for cross-tabulation of codes × behavioral segments
Supports the triangulation workflow: Firebase (what users DO) + Transcripts (what users SAY) + Coded themes (what it MEANS).
Data flow: S3 (raw Firebase export) → DVC pull → aggregate script → import as Case attributes → link to transcripts → code → cross-tab by Firebase segments.
Acceptance Criteria¶
- [ ] #1 Researcher can import Firebase BigQuery export (JSON) as aggregated Case attributes
- [ ] #2 Researcher can import pre-aggregated Firebase CSV (one row per participant)
- [ ] #3 Import auto-detects attribute types (NUMBER for metrics, TEXT for segments, DATE for timestamps)
- [ ] #4 Import maps Firebase user_id to Case name (configurable ID column)
- [ ] #5 Import merges with existing Cases (updates attributes, doesn't create duplicates)
- [ ] #6 Researcher can preview data before importing (column mapping UI)
- [ ] #7 Agent can import Firebase data and link to transcript sources via MCP tools
- [ ] #8 Aggregation script template provided for BigQuery → participant profile CSV
- [ ] #9 DVC pipeline stage template for Firebase → QualCoder workflow
Implementation Notes¶
Three Import Modes¶
Mode 1: Pre-aggregated CSV (simplest, extends existing import_survey_csv)
name,sessions_per_week,avg_session_sec,engagement_tier,top_feature,active_days
user_042,12,340,power_user,export,28
user_043,2,120,casual,search,5
import_survey_csv to auto-detect NUMBER/DATE types (currently all TEXT)
- Add merge behavior: update existing case attributes instead of failing on duplicate names
Mode 2: Firebase BigQuery JSON export
{"user_id": "abc", "event_name": "screen_view", "event_timestamp": "2026-03-10T...", ...}
{"user_id": "abc", "event_name": "feature_used", "event_params": {"feature": "export"}, ...}
firebase_parser.py in exchange/infra/
- Aggregates raw events per user_id into summary metrics
- Configurable aggregation rules (count, avg, max, first/last, categorical)
Mode 3: Firebase BigQuery SQL → CSV (template only) - Provide a SQL template that researchers run in BigQuery console - Output is Mode 1 CSV — no parsing needed in QualCoder
New/Modified Files¶
# New
src/contexts/exchange/infra/firebase_parser.py # Parse + aggregate Firebase JSON
src/contexts/exchange/core/commandHandlers/import_firebase.py # Firebase-specific import
scripts/aggregate_firebase.sql # BigQuery template
scripts/aggregate_firebase.py # Python aggregation script
# Modified
src/contexts/exchange/core/commandHandlers/import_survey_csv.py # Add type detection, merge
src/contexts/exchange/infra/csv_parser.py # Add type inference
src/contexts/exchange/interface/mcp_tools.py # Add import_firebase tool
src/contexts/exchange/core/commands.py # ImportFirebaseCommand
Type Detection Logic¶
def infer_attribute_type(values: list[str]) -> str:
"""Infer CaseAttribute type from sample values."""
if all(is_numeric(v) for v in values if v):
return "NUMBER"
if all(is_date(v) for v in values if v):
return "DATE"
if all(v.lower() in ("true", "false") for v in values if v):
return "BOOLEAN"
return "TEXT"
Merge Behavior¶
When importing and a Case with the same name already exists:
- UPDATE existing attributes (overwrite with new values)
- ADD new attributes that didn't exist before
- PRESERVE attributes not in the import (don't delete)
- Publish CaseAttributeSet events for each change
Triangulation Query Support¶
After import, the existing MCP tools enable triangulation:
- suggest_case_groupings(attribute_names=["engagement_tier"]) → group cases by Firebase segment
- compare_cases(case_ids=[...], code_ids=[...]) → compare coded themes across segments
- Cross-tab: codes × engagement_tier → convergence/divergence matrix
DVC Pipeline Stage¶
# Add to dvc.yaml
stages:
aggregate-firebase:
cmd: python scripts/aggregate_firebase.py raw/firebase_export.json processed/profiles.csv
deps: [raw/firebase_export.json, scripts/aggregate_firebase.py]
outs: [processed/profiles.csv]
Sub-tasks¶
- [ ] QC-051.01 - Enhance CSV import: type detection and case merge behavior
- [ ] QC-051.02 - Firebase BigQuery JSON parser and aggregator
- [ ] QC-051.03 - ImportFirebaseCommand handler
- [ ] QC-051.04 - Column mapping preview UI
- [ ] QC-051.05 - MCP tool: import_firebase
- [ ] QC-051.06 - BigQuery SQL template and Python aggregation script
- [ ] QC-051.07 - DVC pipeline stage template
- [ ] QC-051.08 - E2E tests
- [ ] QC-051.09 - User documentation: Firebase triangulation workflow