027.15 Agent Import File
Description¶
As an AI Agent, I want to import a source file by providing its file path so that I can programmatically add documents, PDFs, images, audio, and video to the project without requiring the researcher to use the UI.
Currently the agent can only add text sources by providing inline content (add_text_source). For binary and large files (PDFs, images, audio, video), the agent needs to reference a file on the local filesystem. The system reads the file, determines its type, extracts text where applicable, and creates the source entry.
Trust Level: T3 (Suggest) — Agent proposes a file to import, researcher approves. File access from the local filesystem requires trust that the path is valid and intended.
Bounded Context: Sources Layer: Interface (MCP tool)
Acceptance Criteria¶
- [x] #1 MCP tool
import_file_sourceis registered withfile_pathas required parameter - [x] #2 Agent can import text files (.txt, .docx, .rtf) by providing an absolute file path
- [x] #3 Agent can import PDF files (.pdf) with automatic text extraction
- [x] #4 Agent can import image files (.png, .jpg, .gif)
- [x] #5 Agent can import audio/video files (.mp3, .wav, .mp4, .mov)
- [x] #6 File type is auto-detected from the file extension
- [x] #7 Non-existent or inaccessible file paths return a clear failure message
- [x] #8 Unsupported file extensions are rejected with a clear failure message listing supported types
- [x] #9 Duplicate source names are rejected (source name defaults to filename, with optional override)
- [x] #10 Optional
nameparameter allows the agent to override the default filename-based source name - [x] #11
SourceAddeddomain event is published on success - [x] #12 Tool returns source ID, name, type, status, and file size on success
- [x] #13 E2E test exists with
@allure.story("QC-027.15 Agent Import File Source")decorator
Notes¶
- Source name defaults to the filename (e.g.,
/data/interview.pdfbecomesinterview.pdf) unless overridden via the optionalnameparameter - Text extraction for PDFs reuses the existing PDF import pipeline
- Image and media sources store the file reference but have no fulltext; they are available for region/timeline coding
- The file must be on the local filesystem accessible to QualCoder — no remote URLs
- Consider a
dry_runparameter (similar toremove_source'sconfirm=false) to let the agent verify the file is valid before committing