This guide explains how to build a custom importer for Phonemos using the GraphQL API. Importers allow you to migrate content from external systems (like Confluence, wikis, or documentation platforms) into Phonemos.

Introduction

The Phonemos importer system provides a GraphQL-based API for importing content from external systems. The import process is designed to be flexible and support various source systems while maintaining data integrity and allowing users to review imports before committing them.

Overview

An import follows this general flow:

A user creates an import in the Phonemos UI and receives an import token
The user provides this token to your importer
Your importer uses the token to authenticate and upload data via GraphQL mutations
The user reviews the imported content in Phonemos
The user commits the import, making the content available

Prerequisites

Access to a Phonemos instance with GraphQL endpoint
Understanding of GraphQL queries and mutations
Ability to make HTTP requests to the GraphQL endpoint
Access to the Phonemos GraphQL schema for type definitions

GraphQL Endpoint and Authentication

The GraphQL endpoint is typically available at:

https://your-instance.phonemos.com/v1/graphql

All import mutations use the anonymous role for authentication. The import token serves as the authorization mechanism, allowing the importer to access only the specific import session.

Core Concepts

Import Token

An import token is a secure identifier that authorizes your importer to upload data to a specific import session. The token format is:

phonemos:import:{site-id}:{hostname}/{secret}

Where:

site-id is a UUID identifying the Phonemos site
hostname is the hostname of the Phonemos instance
secret is a cryptographically secure random string

How to obtain a token: Users create imports through the Phonemos UI:

Navigate to Settings → Import Content → Create Import
Select the target language for the import
The UI displays the import token, which the user copies and provides to your importer

The token must be kept secure and is only valid for the specific import session. Never expose tokens in logs or error messages.

Import Lifecycle

An import progresses through several states:

prepared - The import has been created but not yet started
in_progress - The importer has called import_start and is uploading data
processing - All data has been uploaded and Phonemos is processing relationships
review - The import is ready for user review
committed - The user has committed the import and content is available

The importer controls the transition from prepared to in_progress via import_start. The system automatically transitions through processing to review after import_upload_complete is called. The user controls the final transition to committed via the UI.

External IDs

External IDs are string identifiers that map objects from your source system to Phonemos objects. They serve several purposes:

Uniqueness: Each object in your source system should have a unique external ID
Relationships: Parent-child relationships between objects use external IDs
Link Resolution: Links between pages reference external IDs
Problem Tracking: Errors and warnings are associated with external IDs

External IDs should be:

Stable and consistent across import runs
Unique within the import session
Human-readable when possible (for debugging)

Examples of good external IDs:

"confluence-page-12345"
"doc-abc-def-123"
"file:attachment-67890"

GraphQL API

The importer API is mutation-based. All operations are performed through GraphQL mutations that require an import_token parameter. The mutations are designed to be idempotent where possible, allowing safe retries.

Key characteristics:

All mutations require the import_token parameter
Mutations use the anonymous role (no user authentication required)
Responses include IDs that can be used in subsequent operations
Errors are returned as GraphQL errors with descriptive messages

Refer to the GraphQL schema for complete mutation signatures, input types, and response types.

Import Workflow

Step 1: User Creates Import

The user creates an import in Phonemos UI:

Navigate to Settings → Import Content → Create Import
Select the target language
Copy the generated import token

This step is performed entirely in the Phonemos UI - your importer does not need to implement this.

Step 2: User Provides Token to Importer

The user provides the import token to your importer. This could be:

Pasted into a configuration file
Entered in a UI form
Passed as a command-line argument
Stored in environment variables

Your importer should validate the token format before proceeding.

Step 3: Start Import

Call the import_start mutation with:

import_token: The token provided by the user
source_name: A human-readable name for your source system (e.g., "Confluence", "MediaWiki")
source_url: Optional URL to the source system
importer_version: Version identifier for your importer (currently "2.1" is supported, “2.0” is deprecated)

This mutation transitions the import from prepared to in_progress status.

Step 4: Import Data

Import your content in this general order:

Ensure Users: For each user referenced in your content, call import_ensure_user to create or retrieve the user ID. Cache these IDs for reuse.
Import Page Revisions: For each page revision, call import_wikipage_revision to upload the content. Store the returned revision IDs.
Import Pages: For each page, call import_wikipage with:
- The page metadata (title, parent, order)
- The list of revision IDs from step 2
- Link mappings (unresolved links to external IDs)
Import Documents: For source content in a supported file format (Word, Markdown, HTML, etc.), call import_document to let Phonemos handle the conversion server-side. See the Importing Documents section below.
Import Files: For each file:
- Call import_file_upload to get a presigned upload URL
- Upload the file content to the presigned URL
- Call import_file with the file metadata and revision information
Import Tasks: If your content includes tasks, call import_task for each task.
Report Problems: As you encounter issues, call import_report_problem to record errors, warnings, or informational messages.

The order matters: ensure users before importing content that references them, import revisions before importing pages, and import files before pages that reference them.

Step 5: Handle Problems

Throughout the import process, report problems using import_report_problem:

Errors: Critical issues that prevent content from being imported
Warnings: Issues that don't prevent import but may affect content quality
Info: Informational messages about the import process

Problems are associated with external IDs, allowing users to see which objects had issues.

Step 6: Complete Upload

Once all data has been uploaded, call import_upload_complete. This signals that no more data will be uploaded and allows Phonemos to begin processing relationships and preparing the import for review.

Step 7: User Commits Import

The user reviews the imported content in Phonemos UI and commits the import. This step is performed in the UI - your importer does not need to implement this.

Importing Documents

The import_document mutation imports a document file by converting it server-side via Pandoc. Upload the source file as-is — no manual AST construction required.

Supported Formats

format value	Description
docx	Microsoft Word
markdown	Pandoc-extended Markdown
gfm	GitHub Flavoured Markdown (.md, .gfm)
commonmark	CommonMark
commonmark_x	CommonMark with extensions
markdown_strict	Strict Markdown
html	HTML
odt	OpenDocument Text (LibreOffice/OpenOffice)
rtf	Rich Text Format
latex	LaTeX
typst	Typst
mediawiki	MediaWiki markup
dokuwiki	DokuWiki markup
pandoc	Pandoc native JSON format

Usage

Get a temporary upload URL by calling import_temporary_upload. Returns upload_url, upload_headers (required for the upload request), and a file_key handle.
Upload the file via HTTP PUT to upload_url, including all upload_headers.
Call import_document with:
- import_token: your import token
- input_file_key: the file_key from step 1
- format: one of the format values from the table above
- title (optional): page title; defaults to the filename
- language (optional): language override
- file_in_archive (optional): for archive uploads (.zip, .tar), the path to the target file inside the archive
- external_id / parent_external_id (optional): place the resulting page in the import hierarchy
- relative_link_prefix (optional): prefix for resolving relative links to other pages in the import
Returns a direct_import_job_id.
Poll import_document_status with the direct_import_job_id until a terminal state is reached:
- created — job queued
- converting — Pandoc conversion running
- preparing_import / importing_objects / importing_content — import in progress
- done — success; root_object_id contains the created page ID
- failed — inspect error_summary and error_details
- cancelled

Multiple import_document calls can run concurrently. For archive files, reuse the same file_key with different file_in_archive paths to import multiple documents from a single upload.

Data Types and Operations

Pages (Wikipages)

Pages are the primary content type in Phonemos. A page consists of:

Metadata: Title, parent page, order within parent
Revisions: Historical versions of the page content
Links: References to other pages or files

Structure:

A page has an external_id that identifies it in your source system
A page has a parent_external_id (optional) for hierarchical organization
A page has an orderInParent (optional) for ordering siblings
A page references multiple revisions via revision IDs

Revisions:

Each revision represents a version of the page at a point in time
Revisions include: content (as JSON), publication timestamp, publisher user ID
Revisions are imported separately before the page itself
The page references all its revisions via an array of revision IDs

Link Resolution:

Pages may contain links to other pages or files
Links are initially unresolved (referencing source system URLs or identifiers)
Provide a mapping from unresolved link identifiers to external IDs
Phonemos uses this mapping to resolve links during processing

Refer to the GraphQL schema for the complete structure of import_wikipage_input and import_wikipage_revision_input.

Files

Files are binary content (documents, images, etc.) attached to pages or stored independently.

Upload Process:

Call import_file_upload with file metadata (mime type, size)
Receive a presigned upload URL and encryption key
Upload the file content directly to the presigned URL using HTTP PUT
Store the returned file_content_id for use in file revisions

File Structure:

A file has an external_id identifying it in your source system
A file has a parent_external_id (typically a page external ID)
A file has multiple revisions, each referencing a file_content_id

Revisions:

Each revision represents a version of the file
Revisions include: creation timestamp, creator user ID, file content ID
All revisions for a file are provided when calling import_file

Refer to the GraphQL schema for the complete structure of import_file_file and import_file_revision.

Tasks

Tasks are actionable items that can be embedded in page content or stored independently.

Task Structure:

Each task has a UUID id (generate this in your importer)
Tasks have: description, completion status, assignee, due date
Tasks reference users via user IDs (ensure users first)
Tasks are linked to pages through the page content JSON

Importing Tasks:

Import tasks before or after importing pages (order doesn't matter)
Tasks referenced in page content should be imported before the page
Use the same task ID consistently if a task appears in multiple revisions

Refer to the GraphQL schema for the complete structure of ImportTask.

Users

Users are people who created or modified content in your source system.

Ensuring Users:

Call import_ensure_user for each unique user email address
Provide the user's email (required) and display name (optional)
The mutation returns a user ID (UUID) - cache this for reuse
The same user can be referenced multiple times - reuse the cached ID

User IDs in Content:

Page revisions reference publisher_user_id
File revisions reference created_by_user_id
Tasks reference created_by_user_id and assignee_user_id

Always ensure users before importing content that references them.

Refer to the GraphQL schema for the complete structure of import_ensure_user.

Problems

Problems are errors, warnings, or informational messages about the import process.

Severity Levels:

error: Critical issues that prevent content from being imported correctly
warning: Issues that don't prevent import but may affect quality
info: Informational messages about the import

Problem Structure:

Associated with an external_id (the object that had the problem)
Includes a message describing the issue
Optional details for additional context
Optional source_link to the original content
Optional affected_version if the problem is version-specific

Clearing Problems:

Call import_clear_problems to remove previously reported problems for an external ID
Useful when retrying an import after fixing issues

Refer to the GraphQL schema for the complete structure of import_report_problem_problem.

GraphQL Mutations Overview

The following mutations are available for importers. Refer to the GraphQL schema for complete type definitions.

Import Lifecycle Mutations

import_start: Start an import session. Must be called before uploading any data.
import_upload_complete: Signal that all data has been uploaded. Triggers processing.

Content Mutations

import_wikipage: Import a page with its metadata and revision references.
import_wikipage_revision: Import a single page revision with content.
import_file: Import a file with its revisions.
import_file_upload: Get a presigned URL for uploading file content.
import_task: Import a task.
import_temporary_upload: Get a presigned URL for uploading a file to the temporary store. Required before calling import_document.
import_document: Import a document in any supported format via server-side Pandoc conversion. Returns a direct_import_job_id.
import_document_status: Poll the status of a document import job.
import_filestack: Create a file container (filestack) within the import.

User Management

import_ensure_user: Create or retrieve a user by email address.

Problem Management

import_report_problem: Report an error, warning, or info message.
import_clear_problems: Clear previously reported problems for an external ID.

Utility Mutations

import_convert_confluence_page: Convert Confluence XHTML to Phonemos format (if applicable).

Mutation Dependencies

Mutations have dependencies that must be respected:

import_start must be called first
Users must be ensured before content that references them
Page revisions must be imported before pages that reference them
File content must be uploaded before file revisions reference it
import_upload_complete should be called last (after all data is uploaded)

Common Patterns

Ensuring Users Before Import:

Collect all unique user emails from your source data
Call import_ensure_user for each email
Cache the returned user IDs
Use cached IDs when importing content

Importing Pages with Revisions:

For each page, import all revisions first (collecting revision IDs)
Then import the page, referencing the revision IDs
Provide link mappings for unresolved links

Handling File Uploads:

For each file revision, call import_file_upload to get upload URL
Upload file content to the presigned URL
Collect file content IDs
Import the file with all revisions referencing the content IDs

Implementation Guidelines

Error Handling

GraphQL mutations return errors in the standard GraphQL error format. Your importer should:

Check for errors: Always inspect the response for errors
Retry transient failures: Network errors and temporary server issues can be retried
Report persistent errors: Use import_report_problem to record errors that prevent import
Continue on non-critical errors: Don't abort the entire import for a single object failure

Retry Patterns

Network failures and temporary server issues are common. Implement retry logic:

Exponential backoff: Wait progressively longer between retries
Limit retries: Don't retry indefinitely
Distinguish error types: Some errors (like invalid token) shouldn't be retried
Idempotent operations: Most mutations are idempotent, allowing safe retries

Concurrency

Importing can be parallelized for better performance:

Parallel user ensures: Multiple users can be ensured concurrently
Parallel revision imports: Page revisions can be imported concurrently
Parallel file uploads: File uploads can happen concurrently
Respect rate limits: Don't overwhelm the server with too many concurrent requests

Consider implementing:

A semaphore or similar mechanism to limit concurrent requests
Batching operations where possible
Progress tracking for concurrent operations

Progress Tracking

For long-running imports, track progress:

Count objects: Track total pages, files, revisions to import
Report progress: Update progress as objects are imported
Handle failures: Continue importing even if some objects fail
Final summary: Report total imported, failed, and skipped objects

Token Security

The import token is sensitive and should be:

Kept secure: Never log the full token (log only a masked version)
Not exposed: Don't include tokens in error messages or user-facing output
Validated: Verify token format before making API calls
Single-use mindset: Treat tokens as single-use, even though they're valid for the entire import session

GraphQL Schema Reference

For complete API details, consult the Phonemos GraphQL schema. The schema includes:

Complete mutation signatures: All parameters, types, and return values
Input types: Detailed structure of all input objects
Response types: Structure of all response objects
Enums: All enumeration values (status types, severity levels, etc.)
Field descriptions: Documentation for each field

Accessing the Schema

The GraphQL schema can be accessed via:

Introspection: Use GraphQL introspection queries to explore the schema
Schema file: Request the schema file from your Phonemos administrator
GraphQL playground: Many GraphQL endpoints provide a playground UI for exploring the schema

Key Schema Locations

In the schema, look for:

Mutations: All import_* mutations in the Mutation type
Input types: Types prefixed with import_ (e.g., import_wikipage_input)
Response types: Types ending with _response (e.g., import_wikipage_response)
Enums: import_status_enum, import_report_problem_severity_enum

Example: Finding Mutation Details

To find details about import_wikipage:

Look for import_wikipage in the Mutation type
Note the input type (import_wikipage_input)
Find the definition of import_wikipage_input to see all required and optional fields
Find the response type (import_wikipage_response) to see what data is returned

Best Practices

Token Security and Handling

Validate token format before use
Mask tokens in logs (show only first/last few characters)
Never expose tokens in error messages
Store tokens securely (encrypted if persisted)

Error Handling and Reporting

Always check for GraphQL errors in responses
Report problems using import_report_problem for user visibility
Distinguish between transient and permanent errors
Continue importing even when some objects fail
Provide clear error messages with context

Performance Considerations

Batch operations: Group related operations when possible
Concurrency: Use parallel requests for independent operations
Rate limiting: Respect server capacity and rate limits
Caching: Cache user IDs and other frequently accessed data
Incremental imports: Support resuming interrupted imports

Testing Strategies

Test with small datasets first: Verify your importer works before large imports
Test error handling: Verify behavior with invalid data, network failures
Test idempotency: Verify retries don't create duplicates
Test edge cases: Empty content, missing users, circular references
Validate output: Verify imported content in Phonemos UI

Common Pitfalls to Avoid

Missing user ensures: Always ensure users before importing content that references them
Wrong revision order: Import revisions before pages that reference them
Missing upload complete: Always call import_upload_complete when done
Not reporting problems: Use import_report_problem to help users understand issues
Ignoring errors: Handle GraphQL errors appropriately
Token exposure: Never log or expose full tokens
Not polling document status: After calling import_document, always poll import_document_status until the job reaches a terminal state (done, failed, or cancelled).

Conclusion

Building a custom importer for Phonemos involves:

Obtaining an import token from the Phonemos UI
Starting the import with import_start
Ensuring users exist before importing content
Importing revisions, pages, files, and tasks in the correct order
Reporting problems as they occur
Completing the upload with import_upload_complete

The GraphQL schema provides all the details needed for implementation. Focus on understanding the data model, respecting dependencies between operations, and handling errors gracefully.

For specific API details, always refer to the GraphQL schema provided with your Phonemos instance.

Phonemos User Guide

Writing a Custom Importer