This guide explains how to build a custom importer for Phonemos using the GraphQL API. Importers allow you to migrate content from external systems (like Confluence, wikis, or documentation platforms) into Phonemos.
Introduction
The Phonemos importer system provides a GraphQL-based API for importing content from external systems. The import process is designed to be flexible and support various source systems while maintaining data integrity and allowing users to review imports before committing them.
Overview
An import follows this general flow:
A user creates an import in the Phonemos UI and receives an import token
The user provides this token to your importer
Your importer uses the token to authenticate and upload data via GraphQL mutations
The user reviews the imported content in Phonemos
The user commits the import, making the content available
Prerequisites
Access to a Phonemos instance with GraphQL endpoint
Understanding of GraphQL queries and mutations
Ability to make HTTP requests to the GraphQL endpoint
Access to the Phonemos GraphQL schema for type definitions
GraphQL Endpoint and Authentication
The GraphQL endpoint is typically available at:
https://your-instance.phonemos.com/v1/graphql
All import mutations use the anonymous role for authentication. The import token serves as the authorization mechanism, allowing the importer to access only the specific import session.
Core Concepts
Import Token
An import token is a secure identifier that authorizes your importer to upload data to a specific import session. The token format is:
1
phonemos:import:{site-id}:{hostname}/{secret}Where:
site-id is a UUID identifying the Phonemos site
hostname is the hostname of the Phonemos instance
secret is a cryptographically secure random string
How to obtain a token: Users create imports through the Phonemos UI:
Navigate to Settings → Import Content → Create Import
Select the target language for the import
The UI displays the import token, which the user copies and provides to your importer
The token must be kept secure and is only valid for the specific import session. Never expose tokens in logs or error messages.
Import Lifecycle
An import progresses through several states:
prepared - The import has been created but not yet started
in_progress - The importer has called import_start and is uploading data
processing - All data has been uploaded and Phonemos is processing relationships
review - The import is ready for user review
committed - The user has committed the import and content is available
The importer controls the transition from prepared to in_progress via import_start. The system automatically transitions through processing to review after import_upload_complete is called. The user controls the final transition to committed via the UI.
External IDs
External IDs are string identifiers that map objects from your source system to Phonemos objects. They serve several purposes:
Uniqueness: Each object in your source system should have a unique external ID
Relationships: Parent-child relationships between objects use external IDs
Link Resolution: Links between pages reference external IDs
Problem Tracking: Errors and warnings are associated with external IDs
External IDs should be:
Stable and consistent across import runs
Unique within the import session
Human-readable when possible (for debugging)
Examples of good external IDs:
"confluence-page-12345"
"doc-abc-def-123"
"file:attachment-67890"
GraphQL API
The importer API is mutation-based. All operations are performed through GraphQL mutations that require an import_token parameter. The mutations are designed to be idempotent where possible, allowing safe retries.
Key characteristics:
All mutations require the import_token parameter
Mutations use the anonymous role (no user authentication required)
Responses include IDs that can be used in subsequent operations
Errors are returned as GraphQL errors with descriptive messages
Refer to the GraphQL schema for complete mutation signatures, input types, and response types.
Import Workflow
Step 1: User Creates Import
The user creates an import in Phonemos UI:
Navigate to Settings → Import Content → Create Import
Select the target language
Copy the generated import token
This step is performed entirely in the Phonemos UI - your importer does not need to implement this.
Step 2: User Provides Token to Importer
The user provides the import token to your importer. This could be:
Pasted into a configuration file
Entered in a UI form
Passed as a command-line argument
Stored in environment variables
Your importer should validate the token format before proceeding.
Step 3: Start Import
Call the import_start mutation with:
import_token: The token provided by the user
source_name: A human-readable name for your source system (e.g., "Confluence", "MediaWiki")
source_url: Optional URL to the source system
importer_version: Version identifier for your importer (currently "2.0" is supported)
This mutation transitions the import from prepared to in_progress status.
Step 4: Import Data
Import your content in this general order:
Ensure Users: For each user referenced in your content, call import_ensure_user to create or retrieve the user ID. Cache these IDs for reuse.
Import Page Revisions: For each page revision, call import_wikipage_revision to upload the content. Store the returned revision IDs.
Import Pages: For each page, call import_wikipage with:
The page metadata (title, parent, order)
The list of revision IDs from step 2
Link mappings (unresolved links to external IDs)
Import Files: For each file:
Call import_file_upload to get a presigned upload URL
Upload the file content to the presigned URL
Call import_file with the file metadata and revision information
Import Tasks: If your content includes tasks, call import_task for each task.
Report Problems: As you encounter issues, call import_report_problem to record errors, warnings, or informational messages.
The order matters: ensure users before importing content that references them, import revisions before importing pages, and import files before pages that reference them.
Step 5: Handle Problems
Throughout the import process, report problems using import_report_problem:
Errors: Critical issues that prevent content from being imported
Warnings: Issues that don't prevent import but may affect content quality
Info: Informational messages about the import process
Problems are associated with external IDs, allowing users to see which objects had issues.
Step 6: Complete Upload
Once all data has been uploaded, call import_upload_complete. This signals that no more data will be uploaded and allows Phonemos to begin processing relationships and preparing the import for review.
Step 7: User Commits Import
The user reviews the imported content in Phonemos UI and commits the import. This step is performed in the UI - your importer does not need to implement this.
Data Types and Operations
Pages (Wikipages)
Pages are the primary content type in Phonemos. A page consists of:
Metadata: Title, parent page, order within parent
Revisions: Historical versions of the page content
Links: References to other pages or files
Structure:
A page has an external_id that identifies it in your source system
A page has a parent_external_id (optional) for hierarchical organization
A page has an orderInParent (optional) for ordering siblings
A page references multiple revisions via revision IDs
Revisions:
Each revision represents a version of the page at a point in time
Revisions include: content (as JSON), publication timestamp, publisher user ID
Revisions are imported separately before the page itself
The page references all its revisions via an array of revision IDs
Link Resolution:
Pages may contain links to other pages or files
Links are initially unresolved (referencing source system URLs or identifiers)
Provide a mapping from unresolved link identifiers to external IDs
Phonemos uses this mapping to resolve links during processing
Refer to the GraphQL schema for the complete structure of import_wikipage_input and import_wikipage_revision_input.
Files
Files are binary content (documents, images, etc.) attached to pages or stored independently.
Upload Process:
Call import_file_upload with file metadata (mime type, size)
Receive a presigned upload URL and encryption key
Upload the file content directly to the presigned URL using HTTP PUT
Store the returned file_content_id for use in file revisions
File Structure:
A file has an external_id identifying it in your source system
A file has a parent_external_id (typically a page external ID)
A file has multiple revisions, each referencing a file_content_id
Revisions:
Each revision represents a version of the file
Revisions include: creation timestamp, creator user ID, file content ID
All revisions for a file are provided when calling import_file
Refer to the GraphQL schema for the complete structure of import_file_file and import_file_revision.
Tasks
Tasks are actionable items that can be embedded in page content or stored independently.
Task Structure:
Each task has a UUID id (generate this in your importer)
Tasks have: description, completion status, assignee, due date
Tasks reference users via user IDs (ensure users first)
Tasks are linked to pages through the page content JSON
Importing Tasks:
Import tasks before or after importing pages (order doesn't matter)
Tasks referenced in page content should be imported before the page
Use the same task ID consistently if a task appears in multiple revisions
Refer to the GraphQL schema for the complete structure of ImportTask.
Users
Users are people who created or modified content in your source system.
Ensuring Users:
Call import_ensure_user for each unique user email address
Provide the user's email (required) and display name (optional)
The mutation returns a user ID (UUID) - cache this for reuse
The same user can be referenced multiple times - reuse the cached ID
User IDs in Content:
Page revisions reference publisher_user_id
File revisions reference created_by_user_id
Tasks reference created_by_user_id and assignee_user_id
Always ensure users before importing content that references them.
Refer to the GraphQL schema for the complete structure of import_ensure_user.
Problems
Problems are errors, warnings, or informational messages about the import process.
Severity Levels:
error: Critical issues that prevent content from being imported correctly
warning: Issues that don't prevent import but may affect quality
info: Informational messages about the import
Problem Structure:
Associated with an external_id (the object that had the problem)
Includes a message describing the issue
Optional details for additional context
Optional source_link to the original content
Optional affected_version if the problem is version-specific
Clearing Problems:
Call import_clear_problems to remove previously reported problems for an external ID
Useful when retrying an import after fixing issues
Refer to the GraphQL schema for the complete structure of import_report_problem_problem.
GraphQL Mutations Overview
The following mutations are available for importers. Refer to the GraphQL schema for complete type definitions.
Import Lifecycle Mutations
import_start: Start an import session. Must be called before uploading any data.
import_upload_complete: Signal that all data has been uploaded. Triggers processing.
Content Mutations
import_wikipage: Import a page with its metadata and revision references.
import_wikipage_revision: Import a single page revision with content.
import_file: Import a file with its revisions.
import_file_upload: Get a presigned URL for uploading file content.
import_task: Import a task.
User Management
import_ensure_user: Create or retrieve a user by email address.
Problem Management
import_report_problem: Report an error, warning, or info message.
import_clear_problems: Clear previously reported problems for an external ID.
Utility Mutations
import_convert_confluence_page: Convert Confluence XHTML to Phonemos format (if applicable).
Mutation Dependencies
Mutations have dependencies that must be respected:
import_start must be called first
Users must be ensured before content that references them
Page revisions must be imported before pages that reference them
File content must be uploaded before file revisions reference it
import_upload_complete should be called last (after all data is uploaded)
Common Patterns
Ensuring Users Before Import:
Collect all unique user emails from your source data
Call import_ensure_user for each email
Cache the returned user IDs
Use cached IDs when importing content
Importing Pages with Revisions:
For each page, import all revisions first (collecting revision IDs)
Then import the page, referencing the revision IDs
Provide link mappings for unresolved links
Handling File Uploads:
For each file revision, call import_file_upload to get upload URL
Upload file content to the presigned URL
Collect file content IDs
Import the file with all revisions referencing the content IDs
Implementation Guidelines
Error Handling
GraphQL mutations return errors in the standard GraphQL error format. Your importer should:
Check for errors: Always inspect the response for errors
Retry transient failures: Network errors and temporary server issues can be retried
Report persistent errors: Use import_report_problem to record errors that prevent import
Continue on non-critical errors: Don't abort the entire import for a single object failure
Retry Patterns
Network failures and temporary server issues are common. Implement retry logic:
Exponential backoff: Wait progressively longer between retries
Limit retries: Don't retry indefinitely
Distinguish error types: Some errors (like invalid token) shouldn't be retried
Idempotent operations: Most mutations are idempotent, allowing safe retries
Concurrency
Importing can be parallelized for better performance:
Parallel user ensures: Multiple users can be ensured concurrently
Parallel revision imports: Page revisions can be imported concurrently
Parallel file uploads: File uploads can happen concurrently
Respect rate limits: Don't overwhelm the server with too many concurrent requests
Consider implementing:
A semaphore or similar mechanism to limit concurrent requests
Batching operations where possible
Progress tracking for concurrent operations
Progress Tracking
For long-running imports, track progress:
Count objects: Track total pages, files, revisions to import
Report progress: Update progress as objects are imported
Handle failures: Continue importing even if some objects fail
Final summary: Report total imported, failed, and skipped objects
Token Security
The import token is sensitive and should be:
Kept secure: Never log the full token (log only a masked version)
Not exposed: Don't include tokens in error messages or user-facing output
Validated: Verify token format before making API calls
Single-use mindset: Treat tokens as single-use, even though they're valid for the entire import session
GraphQL Schema Reference
For complete API details, consult the Phonemos GraphQL schema. The schema includes:
Complete mutation signatures: All parameters, types, and return values
Input types: Detailed structure of all input objects
Response types: Structure of all response objects
Enums: All enumeration values (status types, severity levels, etc.)
Field descriptions: Documentation for each field
Accessing the Schema
The GraphQL schema can be accessed via:
Introspection: Use GraphQL introspection queries to explore the schema
Schema file: Request the schema file from your Phonemos administrator
GraphQL playground: Many GraphQL endpoints provide a playground UI for exploring the schema
Key Schema Locations
In the schema, look for:
Mutations: All import_* mutations in the Mutation type
Input types: Types prefixed with import_ (e.g., import_wikipage_input)
Response types: Types ending with _response (e.g., import_wikipage_response)
Enums: import_status_enum, import_report_problem_severity_enum
Example: Finding Mutation Details
To find details about import_wikipage:
Look for import_wikipage in the Mutation type
Note the input type (import_wikipage_input)
Find the definition of import_wikipage_input to see all required and optional fields
Find the response type (import_wikipage_response) to see what data is returned
Best Practices
Token Security and Handling
Validate token format before use
Mask tokens in logs (show only first/last few characters)
Never expose tokens in error messages
Store tokens securely (encrypted if persisted)
Error Handling and Reporting
Always check for GraphQL errors in responses
Report problems using import_report_problem for user visibility
Distinguish between transient and permanent errors
Continue importing even when some objects fail
Provide clear error messages with context
Performance Considerations
Batch operations: Group related operations when possible
Concurrency: Use parallel requests for independent operations
Rate limiting: Respect server capacity and rate limits
Caching: Cache user IDs and other frequently accessed data
Incremental imports: Support resuming interrupted imports
Testing Strategies
Test with small datasets first: Verify your importer works before large imports
Test error handling: Verify behavior with invalid data, network failures
Test idempotency: Verify retries don't create duplicates
Test edge cases: Empty content, missing users, circular references
Validate output: Verify imported content in Phonemos UI
Common Pitfalls to Avoid
Missing user ensures: Always ensure users before importing content that references them
Wrong revision order: Import revisions before pages that reference them
Missing upload complete: Always call import_upload_complete when done
Not reporting problems: Use import_report_problem to help users understand issues
Ignoring errors: Handle GraphQL errors appropriately
Token exposure: Never log or expose full tokens
Race conditions: Be careful with concurrent operations that depend on each other
Conclusion
Building a custom importer for Phonemos involves:
Obtaining an import token from the Phonemos UI
Starting the import with import_start
Ensuring users exist before importing content
Importing revisions, pages, files, and tasks in the correct order
Reporting problems as they occur
Completing the upload with import_upload_complete
The GraphQL schema provides all the details needed for implementation. Focus on understanding the data model, respecting dependencies between operations, and handling errors gracefully.
For specific API details, always refer to the GraphQL schema provided with your Phonemos instance.