Interaction Threads and Conversation Containers: A Clean Architecture for Behavioural Analysis

architecturedomain-modellingbehavioural-analysis

When you build a system that assesses participant behaviour (privilege dynamics, toxic patterns, escalation, coercion, responsiveness), you need an interaction boundary that is real.

A common failure mode is to use a single “conversation” concept for two incompatible jobs:

  • Behavioural boundary: the unit where behaviour emerges and can be inferred.
  • User curation scope: a flexible bucket users can rename, merge, split, and reorganize.

The first must be stable and semantically coherent. The second must be mutable and user-controlled. Conflating them produces silent no-ops, confusing semantics (“success” without work), and invalid inference (cross-context contamination).

This article defines a “perfect-architecture” model that separates these concerns, specifies the invariants, and sketches an implementation path with migration notes.


The core thesis

Compute behavioural analyses on interaction threads. Aggregate and report on user containers.

  • InteractionThread is the only valid compute primitive.
  • ConversationContainer is a mutable scope primitive.

This split makes a strong promise:

  • Behavioural inference never spans unrelated interactions.
  • User organization never corrupts the behavioural signal.
  • Caching and reproducibility become first-class.

Domain model (three layers)

1) Message (atomic event)

A message is an immutable event with:

  • authorship (participant)
  • timestamp
  • content (text + optional rich)
  • provenance (provider IDs, headers)

Messages do not “move” in history; instead, membership and inclusion/exclusion are expressed with edges.

2) InteractionThread (behavioural boundary)

An InteractionThread is the smallest unit on which behaviour is valid to infer.

  • Append-only: new messages can be added.
  • Correctable: identity can be refined via merge/split lineage.
  • Deterministic transcript: built via canonicalization rules.

3) ConversationContainer (user scope)

A ConversationContainer is a user-defined curation scope.

  • Fully mutable membership
  • UX state (active/archived)
  • Optional project scope
  • Supports selective inclusion and message-level exclusions

ASCII diagrams

Domain model

┌──────────────────────────────────────────────────────────────────────────┐
│                                DOMAIN                                    │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────┐          ┌───────────────────┐                        │
│  │   Message      │   *→1    │  InteractionThread │                        │
│  │ (immutable)    │──────────│ (compute primitive)│                        │
│  └───────────────┘          └───────────────────┘                        │
│           │                          ▲   ▲                               │
│           │                          │   │                               │
│           ▼                          │   │                               │
│    ┌───────────────┐        ┌───────────────────┐                        │
│    │ Participant    │ 1→*    │ thread_participants│                        │
│    │ (identity graph)│◀──────│ (edges, auditability)                        │
│    └───────────────┘        └───────────────────┘                        │
│                                                                          │
│                         ┌──────────────────────┐                         │
│                         │ ConversationContainer  │                         │
│                         │ (mutable scope)        │                         │
│                         └───────────┬───────────┘                         │
│                                     │ *→*                                 │
│                                     ▼                                     │
│                           ┌──────────────────────┐                        │
│                           │ container_threads     │                        │
│                           │ (membership edges)    │                        │
│                           └───────────┬──────────┘                        │
│                                       │ 0..*                              │
│                                       ▼                                   │
│                           ┌──────────────────────┐                        │
│                           │ container_exclusions  │                        │
│                           │ (scoped redactions)   │                        │
│                           └──────────────────────┘                        │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Project → Container → Thread hierarchy

┌─────────────────────────────────────────────────────────────────────────────┐
│                          PROJECT (Scope Root)                               │
│                                                                             │
│   ProjectSources = project_documents + project_containers                   │
│                                                                             │
│   ┌─────────────────────┐         ┌─────────────────────────────────────┐   │
│   │  project_documents  │         │  project_containers                 │   │
│   │  (existing)         │         │  ┌─────────────────────────────┐    │   │
│   │                     │         │  │ added_at, added_by          │    │   │
│   │  ┌───────────────┐  │         │  │ removed_at, removed_by      │    │   │
│   │  │ Document      │  │         │  │ (temporal membership)       │    │   │
│   │  │ Document      │  │         │  └─────────────────────────────┘    │   │
│   │  │ Document      │  │         │                │                    │   │
│   │  └───────────────┘  │         │                ▼                    │   │
│   └─────────────────────┘         │  ┌─────────────────────────────┐    │   │
│                                   │  │ ConversationContainer       │    │   │
│                                   │  │ (USER SCOPE PRIMITIVE)      │    │   │
│                                   │  │                             │    │   │
│                                   │  │ • title, description        │    │   │
│                                   │  │ • ai_summary, ai_topics     │    │   │
│                                   │  │ • status (active/archived)  │    │   │
│                                   │  └──────────────┬──────────────┘    │   │
│                                   │                 │                   │   │
│                                   └─────────────────┼───────────────────┘   │
│                                                     │                       │
│                                                     ▼                       │
│                                   ┌─────────────────────────────────────┐   │
│                                   │  container_threads                  │   │
│                                   │  ┌─────────────────────────────┐    │   │
│                                   │  │ added_at, removed_at        │    │   │
│                                   │  │ inclusion_policy            │    │   │
│                                   │  └─────────────────────────────┘    │   │
│                                   │                │                    │   │
│                                   │  container_message_exclusions       │   │
│                                   │  ┌─────────────────────────────┐    │   │
│                                   │  │ excluded_at, reinstated_at  │    │   │
│                                   │  │ reason (aggregation-only)   │    │   │
│                                   │  └─────────────────────────────┘    │   │
│                                   └─────────────────┬───────────────────┘   │
└─────────────────────────────────────────────────────┼───────────────────────┘
                                                      │
══════════════════════════════════════════════════════╪═══════════════════════
                    SCOPE BOUNDARY                    │
══════════════════════════════════════════════════════╪═══════════════════════
                                                      │
┌─────────────────────────────────────────────────────┼───────────────────────┐
│                       COMPUTE LAYER (Truth)         │                       │
│                                                     ▼                       │
│                       ┌─────────────────────────────────────────────────┐   │
│                       │  InteractionThread                              │   │
│                       │  (SOLE COMPUTE PRIMITIVE)                       │   │
│                       │                                                 │   │
│                       │  • interaction_key (canonical, versioned)       │   │
│                       │  • provider_thread_id (provenance)              │   │
│                       │  • transcript_hash, transcript_version          │   │
│                       │  • status (active/merged/split_source)          │   │
│                       └────────────────────┬────────────────────────────┘   │
│                                            │                                │
│            ┌───────────────────────────────┼───────────────────────────┐    │
│            │                               │                           │    │
│            ▼                               ▼                           ▼    │
│   ┌─────────────────┐           ┌─────────────────┐         ┌───────────┐   │
│   │ thread_         │           │ transcript_     │         │ messages  │   │
│   │ participants    │           │ artifacts       │         │           │   │
│   │                 │           │                 │         │ (immutable│   │
│   │ • first_seen_at │           │ • content_hash  │         │  events)  │   │
│   │ • last_seen_at  │           │ • canon_version │         │           │   │
│   │ • thread_role   │           │ • truncation_   │         │           │   │
│   │                 │           │   metadata      │         │           │   │
│   └─────────────────┘           └─────────────────┘         └───────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key: Projects bind to SCOPE (Containers); Compute binds to TRUTH (Threads)

Analysis flow

┌──────────────────────────────────────────────────────────────────────────┐
│                             ANALYSIS FLOW                                 │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │                        COMPUTE PHASE                                │ │
│  │                    (operates on Threads)                            │ │
│  ├─────────────────────────────────────────────────────────────────────┤ │
│  │                                                                     │ │
│  │  InteractionThread                                                  │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Fetch messages WHERE deleted_at IS NULL AND hard_redacted = false  │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Canonicalize transcript (versioned contract)                       │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  transcript_hash = sha256("v{version}:{canonical_transcript}")      │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Store transcript_artifact (compressed, with truncation metadata)   │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Compute stages per thread                                          │ │
│  │   ├─ privilege-detection                                            │ │
│  │   ├─ toxic-patterns                                                 │ │
│  │   └─ risk-assessment                                                │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  analysis_results keyed by:                                         │ │
│  │    (stage_id, thread_id, transcript_hash,                           │ │
│  │     model_id, prompt_version, extractor_version)                    │ │
│  │                                                                     │ │
│  └─────────────────────────────────────────────────────────────────────┘ │
│                              │                                           │
│                              ▼                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │                      AGGREGATION PHASE                              │ │
│  │               (operates on Containers/Projects)                     │ │
│  ├─────────────────────────────────────────────────────────────────────┤ │
│  │                                                                     │ │
│  │  Project / Container                                                │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Resolve threads via project_containers → container_threads         │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Fetch thread analyses WHERE transcript_hash matches current        │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Apply container_message_exclusions (aggregation-time only)         │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Synthesize rollups + narratives with evidence pointers             │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Snapshot scope + provenance into report manifest                   │ │
│  │      │                                                              │ │
│  │      ▼                                                              │ │
│  │  Report (immutable artifact)                                        │ │
│  │                                                                     │ │
│  └─────────────────────────────────────────────────────────────────────┘ │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Report snapshot model

┌──────────────────────────────────────────────────────────────────────────┐
│                         REPORT SNAPSHOT MODEL                            │
│                    (Auditable + Reproducible Artifacts)                  │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                     SCOPE SNAPSHOT (as of T)                       │  │
│  ├────────────────────────────────────────────────────────────────────┤  │
│  │                                                                    │  │
│  │   Container membership:                                            │  │
│  │   ┌──────────────────────────────────────────────────────────┐     │  │
│  │   │ container_threads WHERE added_at <= T                    │     │  │
│  │   │                     AND (removed_at IS NULL              │     │  │
│  │   │                          OR removed_at > T)              │     │  │
│  │   └──────────────────────────────────────────────────────────┘     │  │
│  │                                                                    │  │
│  │   Message exclusions:                                              │  │
│  │   ┌──────────────────────────────────────────────────────────┐     │  │
│  │   │ container_message_exclusions WHERE excluded_at <= T      │     │  │
│  │   │                              AND (reinstated_at IS NULL  │     │  │
│  │   │                                   OR reinstated_at > T)  │     │  │
│  │   └──────────────────────────────────────────────────────────┘     │  │
│  │                                                                    │  │
│  │   Documents:                                                       │  │
│  │   ┌──────────────────────────────────────────────────────────┐     │  │
│  │   │ project_documents linked at time T                       │     │  │
│  │   └──────────────────────────────────────────────────────────┘     │  │
│  │                                                                    │  │
│  │   Storage: manifest.toml "scope" section                           │  │
│  │                                                                    │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                   PROVENANCE SNAPSHOT                              │  │
│  ├────────────────────────────────────────────────────────────────────┤  │
│  │                                                                    │  │
│  │   Analysis references:                                             │  │
│  │   ┌──────────────────────────────────────────────────────────┐     │  │
│  │   │ report_analysis_snapshots                                │     │  │
│  │   │   • report_id                                            │     │  │
│  │   │   • analysis_result_id                                   │     │  │
│  │   │                                                          │     │  │
│  │   │ Each analysis_result includes:                           │     │  │
│  │   │   • thread_id                                            │     │  │
│  │   │   • transcript_hash                                      │     │  │
│  │   │   • stage_id                                             │     │  │
│  │   │   • model_id, prompt_version, extractor_version          │     │  │
│  │   └──────────────────────────────────────────────────────────┘     │  │
│  │                                                                    │  │
│  │   Transcript artifacts:                                            │  │
│  │   ┌──────────────────────────────────────────────────────────┐     │  │
│  │   │ transcript_artifacts (immutable by design)               │     │  │
│  │   │   • thread_id                                            │     │  │
│  │   │   • transcript_hash                                      │     │  │
│  │   │   • canonicalization_version                             │     │  │
│  │   │   • content_compressed                                   │     │  │
│  │   └──────────────────────────────────────────────────────────┘     │  │
│  │                                                                    │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                         INVARIANT                                  │  │
│  ├────────────────────────────────────────────────────────────────────┤  │
│  │                                                                    │  │
│  │   Report rendering reads ONLY from:                                │  │
│  │                                                                    │  │
│  │     1. manifest.scope (frozen container/document list)             │  │
│  │     2. report_analysis_snapshots (frozen analysis refs)            │  │
│  │     3. transcript_artifacts (immutable)                            │  │
│  │                                                                    │  │
│  │   NEVER re-query live project_containers or container_threads      │  │
│  │   for display. Container changes after report creation do NOT      │  │
│  │   change report content.                                           │  │
│  │                                                                    │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Temporal correctness invariants (required)

The report snapshot model relies on temporal edge queries (membership/exclusion “as of T”). For those queries to be correct and safe, the temporal fields must satisfy explicit invariants. Without these invariants, the schema is ambiguous and can drift into silent corruption (e.g., an edge that is “removed before it was added”).

Interval semantics

Use half-open intervals for all temporal edges:

  • Membership/exclusion is active over ([start, end)).
  • start is inclusive (<= T).
  • end is exclusive (> T); if end == T, it is considered inactive at time T.

This matches the snapshot predicates shown above:

  • added_at <= T AND (removed_at IS NULL OR removed_at > T)
  • excluded_at <= T AND (reinstated_at IS NULL OR reinstated_at > T)

Required invariants

For every temporal membership edge (Project→Container and Container→Thread):

  • Ordering: removed_at IS NULL OR removed_at > added_at
  • Audit pairing: removed_at IS NULL implies removed_by IS NULL (and if removed_at IS NOT NULL, removed_by IS NOT NULL)
  • No overlapping active membership for the same pair:
    • at most one “active” edge per (project_id, container_id) where removed_at IS NULL
    • at most one “active” edge per (container_id, thread_id) where removed_at IS NULL
  • Re-add is a new interval: if a membership is removed and later re-added, insert a new row (do not “unset” removed_at, otherwise history and snapshots become ambiguous).

For every temporal exclusion edge (Container→Message exclusion):

  • Ordering: reinstated_at IS NULL OR reinstated_at > excluded_at
  • Audit pairing: reinstated_at IS NULL implies reinstated_by IS NULL (and if reinstated_at IS NOT NULL, reinstated_by IS NOT NULL)
  • No overlapping active exclusion for the same pair:
    • at most one “active” exclusion per (container_id, message_id) where reinstated_at IS NULL
  • Re-exclude is a new interval: if an exclusion is reinstated and later excluded again, insert a new row (or supersede the prior row in a lineage-safe way).

Enforcement recommendations (schema-level)

These invariants should be enforced by the database, not just by application code. Recommended mechanisms:

  • CHECK constraints:
    • removed_at IS NULL OR removed_at > added_at
    • removed_at IS NULL OR removed_by IS NOT NULL
    • reinstated_at IS NULL OR reinstated_at > excluded_at
    • reinstated_at IS NULL OR reinstated_by IS NOT NULL
  • Partial unique indexes (or equivalent constraints) to prevent overlapping active intervals:
    • unique (project_id, container_id) where removed_at IS NULL
    • unique (container_id, thread_id) where removed_at IS NULL
    • unique (container_id, message_id) where reinstated_at IS NULL

The result is a schema that makes the snapshot queries provably correct under the stated interval semantics, and prevents “time-travel” bugs from quietly entering production.

Identity resolution graph

┌──────────────────────────────────────────────────────────────────────────┐
│                      PARTICIPANT IDENTITY GRAPH                          │
│                  (Evolvable resolution without rewriting history)        │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │                       participants                              │    │
│   │                    (canonical entities)                         │    │
│   ├─────────────────────────────────────────────────────────────────┤    │
│   │  id              │ uuid                                         │    │
│   │  user_id         │ owner of this participant record             │    │
│   │  canonical_email │ normalized primary email                     │    │
│   │  display_name    │ preferred display name                       │    │
│   │  inferred_role   │ 'internal' | 'external' | 'counsel' | 'client'│   │
│   │  role_confidence │ 0.0 - 1.0                                    │    │
│   │  role_override   │ user-set override                            │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                              ▲                                           │
│                              │ *→1                                       │
│                              │                                           │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │                  participant_alias_edges                        │    │
│   │               (resolution graph, versioned)                     │    │
│   ├─────────────────────────────────────────────────────────────────┤    │
│   │  from_identity_id │ FK to participant_identities                │    │
│   │  to_participant_id│ FK to participants (canonical)              │    │
│   │  confidence       │ 0.0 - 1.0                                   │    │
│   │  method           │ 'exact_match' | 'domain_match' | 'ml' | ... │    │
│   │  superseded_at    │ when this resolution was replaced           │    │
│   │  superseded_by    │ newer edge ID                               │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                              ▲                                           │
│                              │ 1→*                                       │
│                              │                                           │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │                  participant_identities                         │    │
│   │              (provider-specific identifiers)                    │    │
│   ├─────────────────────────────────────────────────────────────────┤    │
│   │  id               │ uuid                                        │    │
│   │  participant_id   │ FK to participants                          │    │
│   │  provider         │ 'gmail' | 'outlook' | 'slack' | 'manual'    │    │
│   │  address          │ email address or identifier                 │    │
│   │  display_name     │ provider-specific display name              │    │
│   │  verified_at      │ when this identity was verified             │    │
│   │  verification_src │ 'oauth' | 'admin' | 'heuristic'             │    │
│   │  first_seen_at    │ temporal bounds                             │    │
│   │  last_seen_at     │                                             │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   Benefits:                                                              │
│   • Add new aliases without rewriting message history                    │
│   • Track resolution confidence and lineage                              │
│   • Support multi-provider identity (Gmail + Outlook + Slack)            │
│   • Enable iterative ML improvement of identity resolution               │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Complete table structure (14 tables)

┌──────────────────────────────────────────────────────────────────────────┐
│                        COMPLETE SCHEMA (14 tables)                       │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   IDENTITY LAYER (3 tables)                                              │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │  participants                  ─┬─ participant_identities       │    │
│   │  (canonical entities)           └─ participant_alias_edges      │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   COMPUTE LAYER (3 tables)                                               │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │  interaction_threads           ─┬─ messages                     │    │
│   │  (SOLE COMPUTE PRIMITIVE)       └─ transcript_artifacts         │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   SCOPE LAYER (2 tables)                                                 │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │  conversation_containers       ─── project_containers           │    │
│   │  (USER SCOPE PRIMITIVE)            (temporal membership)        │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   EDGE TABLES (3 tables)                                                 │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │  thread_participants           (thread ↔ participant)          │    │
│   │  container_threads             (container ↔ thread, temporal)   │    │
│   │  container_message_exclusions  (aggregation-time redactions)    │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   LINEAGE TABLES (3 tables)                                              │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │  thread_merges                 (parent absorbs child)           │    │
│   │  thread_splits                 (original → new thread)          │    │
│   │  thread_split_messages         (which messages moved)           │    │
│   └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   TABLE RELATIONSHIPS:                                                   │
│                                                                          │
│   projects ──────────────────┬──────────────────────────────────────     │
│       │                      │                                           │
│       │ project_documents    │ project_containers                        │
│       │      │               │      │                                    │
│       ▼      ▼               │      ▼                                    │
│   documents              conversation_containers                         │
│                                    │                                     │
│                          container_threads                               │
│                          container_message_exclusions                    │
│                                    │                                     │
│                                    ▼                                     │
│                          interaction_threads ◄─────────────────────      │
│                              │     │     │                         │     │
│                              │     │     │         thread_merges ──┘     │
│                              │     │     │         thread_splits         │
│                              │     │     │         thread_split_messages │
│                              ▼     ▼     ▼                               │
│                         messages  transcript_artifacts                   │
│                              │    thread_participants                    │
│                              │           │                               │
│                              ▼           ▼                               │
│                         participants ◄───┘                               │
│                              │                                           │
│                              ├─── participant_identities                 │
│                              └─── participant_alias_edges                │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Identity: provider provenance vs canonical identity

A robust system distinguishes:

  • Provider IDs: what Gmail/Outlook/Slack assigns.
  • Canonical IDs: what the system uses internally.

This avoids schema regret when adding providers and prevents accidental coupling to provider semantics.

Messages should capture:

  • provider name
  • provider message id
  • RFC 5322 headers (Message-ID / In-Reply-To / References) where applicable

Threads should capture:

  • provider thread id (if available)
  • canonical interaction_key (provider-independent, versioned)

Participants should be modeled as an identity graph:

  • canonical participant entity
  • provider identities and aliases
  • inferred roles + user overrides
  • confidence and lineage of inferences

Thread inference strategy (interaction_key)

Thread identity needs a deterministic definition. In email ecosystems:

  1. Prefer standards-based linkage when available:
  • RFC 5322 References / In-Reply-To chains
  1. Use provider thread ids as a strong hint, not a universal truth:
  • provider thread ids are helpful but not always semantically correct across forwarding, subject drift, or cross-account merges.
  1. Use clustering fallback:
  • normalized participant set
  • subject root
  • temporal coherence window

The algorithm should be versioned:

  • interaction_key_version increments when the inference logic changes.
  • Thread merges/splits are represented explicitly; history is not rewritten away.

Transcript canonicalization (and why it must be contract-driven)

Transcript text is not “just concatenation”. It is a derived artifact.

A canonicalization contract should specify:

  • ordering
  • header inclusion
  • quoted-text policy
  • whitespace normalization
  • truncation strategy
  • maximum size constraints

The transcript hash is computed after canonicalization and must encode the canonicalization version.

Two critical operational rules:

  • Hashes from different canonicalization versions are not comparable. Store version alongside the hash.
  • Truncation must be explicit. Store truncation metadata (how many messages omitted, from which side, counts by participant).

Storage model (schema sketch)

This is a conceptual schema, not a literal prescription. The essential property is that edges replace arrays for auditability and queryability.

Core tables

  • messages
  • interaction_threads
  • conversation_containers
  • participants

Edge tables

  • thread_participants(thread_id, participant_id, first_seen_at, last_seen_at, message_count, thread_role, …)
  • container_threads(container_id, thread_id, inclusion_policy, added_at, removed_at, …)
  • container_message_exclusions(container_id, thread_id, message_id, excluded_at, reinstated_at, reason, …)

Why edges over arrays:

  • audit trails (“who changed what?”)
  • incremental updates without read-modify-write
  • efficient indexing
  • no array growth pathologies

Lifecycle semantics: one contract per layer

A system becomes un-debuggable when it has multiple deletion signals with unclear authority. The clean approach is:

  • Message.deletedAt: hard exclusion from compute, aggregation, default reads
  • InteractionThread.deletedAt: hard exclusion from compute, aggregation, default reads
  • ConversationContainer.status: UX-only (active/archived)
  • ConversationContainer.deletedAt: soft exclusion from default reads (whether it affects aggregation is a product/legal decision)

Containers should not control what “happened” in an interaction. They control what a user includes in a scope.

If legal privilege requires compute exclusion, treat it as a separate, explicit redaction mechanism—not the same thing as container curation.


Analysis result keying: two-axis versioning

To preserve reproducibility and avoid silent drift, analysis result identity must include:

  • Input versioning: thread + transcript hash (+ canonicalization version)
  • Method versioning: stage + model id + prompt version (+ extractor version)

Conceptually:

(stage_id, thread_id, transcript_hash, canon_version, model_id, prompt_version, extractor_version?)

This enables:

  • parallel runs under new prompts/models
  • comparisons between versions
  • migration without overwriting history

Success/outcome semantics: distinguish “did work” from “had nothing to do”

A pipeline that returns “success” for a no-op run is operationally dangerous. A clean contract is:

  • overall outcome: completed | skipped_no_sources | partial | failed
  • per-stage: skipped: true with a skipReason

This turns observability into a domain guarantee instead of an afterthought.


Implementation notes (monorepo realities)

Keep orchestration separate from business logic

A clean system centralizes business logic (source fetching, transcript construction, analysis compute, persistence) behind API/use-case boundaries.

Schedulers / workflow engines should:

  • trigger the correct API entrypoint
  • provide retries, idempotency keys, and progress metadata
  • record outcome status cleanly

Avoid “polymorphic source types” that hide conceptual mismatch

Polymorphic sourceType/sourceId storage can be useful, but only if the domain primitives are clear.

Under this architecture:

  • behavioural stages should target InteractionThread
  • document stages target Document
  • message-level stages target Message (if needed)

A container is not a compute target.

Make stage routing declarative

Stage → allowed compute target type is a “law”, not a convention.

Define a small stage registry:

  • stage id
  • required/optional
  • compute target type (thread|document|message)
  • compute method version metadata

Migration strategy (from a conflated model)

The guiding principle: do not lie about what legacy data means.

If the legacy conversations concept is Gmail-thread-derived, treat it as a seed for InteractionThreads.

A safe narrative:

  1. Introduce new tables side-by-side.
  2. Backfill interaction_threads from legacy conversations.
  3. Create “default containers” for continuity (one container per thread), but treat containers as a new domain concept.
  4. Replace project membership edges to point to containers (or keep projects pointing at containers while containers point at threads).
  5. Move compute to thread transcripts and aggregate at container/project scope.
  6. Retire conflated tables once parity is proven.

The invariants that define “perfect architecture”

  1. Behaviour compute happens only within InteractionThread boundaries.
  2. ConversationContainer is a scope primitive only.
  3. Messages are immutable events; membership and exclusions are edges with auditability.
  4. Transcript building is deterministic and versioned; hashes encode canonicalization version.
  5. Analysis results are keyed by input × method versioning to eliminate silent drift.
  6. Lifecycle semantics are explicit and consistent per layer.

If those invariants hold, most downstream decisions (caching, reruns, UI views, report synthesis) become straightforward: the system expresses intent instead of leaking implementation details into domain meaning.