Skip to content

027.15 Agent Import File

Description

As an AI Agent, I want to import a source file by providing its file path so that I can programmatically add documents, PDFs, images, audio, and video to the project without requiring the researcher to use the UI.

Currently the agent can only add text sources by providing inline content (add_text_source). For binary and large files (PDFs, images, audio, video), the agent needs to reference a file on the local filesystem. The system reads the file, determines its type, extracts text where applicable, and creates the source entry.

Trust Level: T3 (Suggest) — Agent proposes a file to import, researcher approves. File access from the local filesystem requires trust that the path is valid and intended.

Bounded Context: Sources Layer: Interface (MCP tool)

Acceptance Criteria

  • [x] #1 MCP tool import_file_source is registered with file_path as required parameter
  • [x] #2 Agent can import text files (.txt, .docx, .rtf) by providing an absolute file path
  • [x] #3 Agent can import PDF files (.pdf) with automatic text extraction
  • [x] #4 Agent can import image files (.png, .jpg, .gif)
  • [x] #5 Agent can import audio/video files (.mp3, .wav, .mp4, .mov)
  • [x] #6 File type is auto-detected from the file extension
  • [x] #7 Non-existent or inaccessible file paths return a clear failure message
  • [x] #8 Unsupported file extensions are rejected with a clear failure message listing supported types
  • [x] #9 Duplicate source names are rejected (source name defaults to filename, with optional override)
  • [x] #10 Optional name parameter allows the agent to override the default filename-based source name
  • [x] #11 SourceAdded domain event is published on success
  • [x] #12 Tool returns source ID, name, type, status, and file size on success
  • [x] #13 E2E test exists with @allure.story("QC-027.15 Agent Import File Source") decorator

Notes

  • Source name defaults to the filename (e.g., /data/interview.pdf becomes interview.pdf) unless overridden via the optional name parameter
  • Text extraction for PDFs reuses the existing PDF import pipeline
  • Image and media sources store the file reference but have no fulltext; they are available for region/timeline coding
  • The file must be on the local filesystem accessible to QualCoder — no remote URLs
  • Consider a dry_run parameter (similar to remove_source's confirm=false) to let the agent verify the file is valid before committing