Phonemos User Guide

Writing a Custom Importer

This guide explains how to build a custom importer for Phonemos using the GraphQL API. Importers allow you to migrate content from external systems (like Confluence, wikis, or documentation platforms) into Phonemos.

Introduction

The Phonemos importer system provides a GraphQL-based API for importing content from external systems. The import process is designed to be flexible and support various source systems while maintaining data integrity and allowing users to review imports before committing them.

Overview

An import follows this general flow:

  1. A user creates an import in the Phonemos UI and receives an import token

  2. The user provides this token to your importer

  3. Your importer uses the token to authenticate and upload data via GraphQL mutations

  4. The user reviews the imported content in Phonemos

  5. The user commits the import, making the content available

Prerequisites

  • Access to a Phonemos instance with GraphQL endpoint

  • Understanding of GraphQL queries and mutations

  • Ability to make HTTP requests to the GraphQL endpoint

  • Access to the Phonemos GraphQL schema for type definitions

GraphQL Endpoint and Authentication

The GraphQL endpoint is typically available at:

  • https://your-instance.phonemos.com/v1/graphql

All import mutations use the anonymous role for authentication. The import token serves as the authorization mechanism, allowing the importer to access only the specific import session.

Core Concepts

Import Token

An import token is a secure identifier that authorizes your importer to upload data to a specific import session. The token format is:

1 phonemos:import:{site-id}:{hostname}/{secret}

Where:

  • site-id is a UUID identifying the Phonemos site

  • hostname is the hostname of the Phonemos instance

  • secret is a cryptographically secure random string

How to obtain a token: Users create imports through the Phonemos UI:

  1. Navigate to Settings → Import Content → Create Import

  2. Select the target language for the import

  3. The UI displays the import token, which the user copies and provides to your importer

The token must be kept secure and is only valid for the specific import session. Never expose tokens in logs or error messages.

Import Lifecycle

An import progresses through several states:

  1. prepared - The import has been created but not yet started

  2. in_progress - The importer has called import_start and is uploading data

  3. processing - All data has been uploaded and Phonemos is processing relationships

  4. review - The import is ready for user review

  5. committed - The user has committed the import and content is available

The importer controls the transition from prepared to in_progress via import_start. The system automatically transitions through processing to review after import_upload_complete is called. The user controls the final transition to committed via the UI.

External IDs

External IDs are string identifiers that map objects from your source system to Phonemos objects. They serve several purposes:

  • Uniqueness: Each object in your source system should have a unique external ID

  • Relationships: Parent-child relationships between objects use external IDs

  • Link Resolution: Links between pages reference external IDs

  • Problem Tracking: Errors and warnings are associated with external IDs

External IDs should be:

  • Stable and consistent across import runs

  • Unique within the import session

  • Human-readable when possible (for debugging)

Examples of good external IDs:

  • "confluence-page-12345"

  • "doc-abc-def-123"

  • "file:attachment-67890"

GraphQL API

The importer API is mutation-based. All operations are performed through GraphQL mutations that require an import_token parameter. The mutations are designed to be idempotent where possible, allowing safe retries.

Key characteristics:

  • All mutations require the import_token parameter

  • Mutations use the anonymous role (no user authentication required)

  • Responses include IDs that can be used in subsequent operations

  • Errors are returned as GraphQL errors with descriptive messages

Refer to the GraphQL schema for complete mutation signatures, input types, and response types.

Import Workflow

Step 1: User Creates Import

The user creates an import in Phonemos UI:

  • Navigate to Settings → Import Content → Create Import

  • Select the target language

  • Copy the generated import token

This step is performed entirely in the Phonemos UI - your importer does not need to implement this.

Step 2: User Provides Token to Importer

The user provides the import token to your importer. This could be:

  • Pasted into a configuration file

  • Entered in a UI form

  • Passed as a command-line argument

  • Stored in environment variables

Your importer should validate the token format before proceeding.

Step 3: Start Import

Call the import_start mutation with:

  • import_token: The token provided by the user

  • source_name: A human-readable name for your source system (e.g., "Confluence", "MediaWiki")

  • source_url: Optional URL to the source system

  • importer_version: Version identifier for your importer (currently "2.0" is supported)

This mutation transitions the import from prepared to in_progress status.

Step 4: Import Data

Import your content in this general order:

  1. Ensure Users: For each user referenced in your content, call import_ensure_user to create or retrieve the user ID. Cache these IDs for reuse.

  2. Import Page Revisions: For each page revision, call import_wikipage_revision to upload the content. Store the returned revision IDs.

  3. Import Pages: For each page, call import_wikipage with:

    • The page metadata (title, parent, order)

    • The list of revision IDs from step 2

    • Link mappings (unresolved links to external IDs)

  4. Import Files: For each file:

    • Call import_file_upload to get a presigned upload URL

    • Upload the file content to the presigned URL

    • Call import_file with the file metadata and revision information

  5. Import Tasks: If your content includes tasks, call import_task for each task.

  6. Report Problems: As you encounter issues, call import_report_problem to record errors, warnings, or informational messages.

The order matters: ensure users before importing content that references them, import revisions before importing pages, and import files before pages that reference them.

Step 5: Handle Problems

Throughout the import process, report problems using import_report_problem:

  • Errors: Critical issues that prevent content from being imported

  • Warnings: Issues that don't prevent import but may affect content quality

  • Info: Informational messages about the import process

Problems are associated with external IDs, allowing users to see which objects had issues.

Step 6: Complete Upload

Once all data has been uploaded, call import_upload_complete. This signals that no more data will be uploaded and allows Phonemos to begin processing relationships and preparing the import for review.

Step 7: User Commits Import

The user reviews the imported content in Phonemos UI and commits the import. This step is performed in the UI - your importer does not need to implement this.

Data Types and Operations

Pages (Wikipages)

Pages are the primary content type in Phonemos. A page consists of:

  • Metadata: Title, parent page, order within parent

  • Revisions: Historical versions of the page content

  • Links: References to other pages or files

Structure:

  • A page has an external_id that identifies it in your source system

  • A page has a parent_external_id (optional) for hierarchical organization

  • A page has an orderInParent (optional) for ordering siblings

  • A page references multiple revisions via revision IDs

Revisions:

  • Each revision represents a version of the page at a point in time

  • Revisions include: content (as JSON), publication timestamp, publisher user ID

  • Revisions are imported separately before the page itself

  • The page references all its revisions via an array of revision IDs

Link Resolution:

  • Pages may contain links to other pages or files

  • Links are initially unresolved (referencing source system URLs or identifiers)

  • Provide a mapping from unresolved link identifiers to external IDs

  • Phonemos uses this mapping to resolve links during processing

Refer to the GraphQL schema for the complete structure of import_wikipage_input and import_wikipage_revision_input.

Files

Files are binary content (documents, images, etc.) attached to pages or stored independently.

Upload Process:

  1. Call import_file_upload with file metadata (mime type, size)

  2. Receive a presigned upload URL and encryption key

  3. Upload the file content directly to the presigned URL using HTTP PUT

  4. Store the returned file_content_id for use in file revisions

File Structure:

  • A file has an external_id identifying it in your source system

  • A file has a parent_external_id (typically a page external ID)

  • A file has multiple revisions, each referencing a file_content_id

Revisions:

  • Each revision represents a version of the file

  • Revisions include: creation timestamp, creator user ID, file content ID

  • All revisions for a file are provided when calling import_file

Refer to the GraphQL schema for the complete structure of import_file_file and import_file_revision.

Tasks

Tasks are actionable items that can be embedded in page content or stored independently.

Task Structure:

  • Each task has a UUID id (generate this in your importer)

  • Tasks have: description, completion status, assignee, due date

  • Tasks reference users via user IDs (ensure users first)

  • Tasks are linked to pages through the page content JSON

Importing Tasks:

  • Import tasks before or after importing pages (order doesn't matter)

  • Tasks referenced in page content should be imported before the page

  • Use the same task ID consistently if a task appears in multiple revisions

Refer to the GraphQL schema for the complete structure of ImportTask.

Users

Users are people who created or modified content in your source system.

Ensuring Users:

  • Call import_ensure_user for each unique user email address

  • Provide the user's email (required) and display name (optional)

  • The mutation returns a user ID (UUID) - cache this for reuse

  • The same user can be referenced multiple times - reuse the cached ID

User IDs in Content:

  • Page revisions reference publisher_user_id

  • File revisions reference created_by_user_id

  • Tasks reference created_by_user_id and assignee_user_id

Always ensure users before importing content that references them.

Refer to the GraphQL schema for the complete structure of import_ensure_user.

Problems

Problems are errors, warnings, or informational messages about the import process.

Severity Levels:

  • error: Critical issues that prevent content from being imported correctly

  • warning: Issues that don't prevent import but may affect quality

  • info: Informational messages about the import

Problem Structure:

  • Associated with an external_id (the object that had the problem)

  • Includes a message describing the issue

  • Optional details for additional context

  • Optional source_link to the original content

  • Optional affected_version if the problem is version-specific

Clearing Problems:

  • Call import_clear_problems to remove previously reported problems for an external ID

  • Useful when retrying an import after fixing issues

Refer to the GraphQL schema for the complete structure of import_report_problem_problem.

GraphQL Mutations Overview

The following mutations are available for importers. Refer to the GraphQL schema for complete type definitions.

Import Lifecycle Mutations

  • import_start: Start an import session. Must be called before uploading any data.

  • import_upload_complete: Signal that all data has been uploaded. Triggers processing.

Content Mutations

  • import_wikipage: Import a page with its metadata and revision references.

  • import_wikipage_revision: Import a single page revision with content.

  • import_file: Import a file with its revisions.

  • import_file_upload: Get a presigned URL for uploading file content.

  • import_task: Import a task.

User Management

  • import_ensure_user: Create or retrieve a user by email address.

Problem Management

  • import_report_problem: Report an error, warning, or info message.

  • import_clear_problems: Clear previously reported problems for an external ID.

Utility Mutations

  • import_convert_confluence_page: Convert Confluence XHTML to Phonemos format (if applicable).

Mutation Dependencies

Mutations have dependencies that must be respected:

  1. import_start must be called first

  2. Users must be ensured before content that references them

  3. Page revisions must be imported before pages that reference them

  4. File content must be uploaded before file revisions reference it

  5. import_upload_complete should be called last (after all data is uploaded)

Common Patterns

Ensuring Users Before Import:

  1. Collect all unique user emails from your source data

  2. Call import_ensure_user for each email

  3. Cache the returned user IDs

  4. Use cached IDs when importing content

Importing Pages with Revisions:

  1. For each page, import all revisions first (collecting revision IDs)

  2. Then import the page, referencing the revision IDs

  3. Provide link mappings for unresolved links

Handling File Uploads:

  1. For each file revision, call import_file_upload to get upload URL

  2. Upload file content to the presigned URL

  3. Collect file content IDs

  4. Import the file with all revisions referencing the content IDs

Implementation Guidelines

Error Handling

GraphQL mutations return errors in the standard GraphQL error format. Your importer should:

  • Check for errors: Always inspect the response for errors

  • Retry transient failures: Network errors and temporary server issues can be retried

  • Report persistent errors: Use import_report_problem to record errors that prevent import

  • Continue on non-critical errors: Don't abort the entire import for a single object failure

Retry Patterns

Network failures and temporary server issues are common. Implement retry logic:

  • Exponential backoff: Wait progressively longer between retries

  • Limit retries: Don't retry indefinitely

  • Distinguish error types: Some errors (like invalid token) shouldn't be retried

  • Idempotent operations: Most mutations are idempotent, allowing safe retries

Concurrency

Importing can be parallelized for better performance:

  • Parallel user ensures: Multiple users can be ensured concurrently

  • Parallel revision imports: Page revisions can be imported concurrently

  • Parallel file uploads: File uploads can happen concurrently

  • Respect rate limits: Don't overwhelm the server with too many concurrent requests

Consider implementing:

  • A semaphore or similar mechanism to limit concurrent requests

  • Batching operations where possible

  • Progress tracking for concurrent operations

Progress Tracking

For long-running imports, track progress:

  • Count objects: Track total pages, files, revisions to import

  • Report progress: Update progress as objects are imported

  • Handle failures: Continue importing even if some objects fail

  • Final summary: Report total imported, failed, and skipped objects

Token Security

The import token is sensitive and should be:

  • Kept secure: Never log the full token (log only a masked version)

  • Not exposed: Don't include tokens in error messages or user-facing output

  • Validated: Verify token format before making API calls

  • Single-use mindset: Treat tokens as single-use, even though they're valid for the entire import session

GraphQL Schema Reference

For complete API details, consult the Phonemos GraphQL schema. The schema includes:

  • Complete mutation signatures: All parameters, types, and return values

  • Input types: Detailed structure of all input objects

  • Response types: Structure of all response objects

  • Enums: All enumeration values (status types, severity levels, etc.)

  • Field descriptions: Documentation for each field

Accessing the Schema

The GraphQL schema can be accessed via:

  • Introspection: Use GraphQL introspection queries to explore the schema

  • Schema file: Request the schema file from your Phonemos administrator

  • GraphQL playground: Many GraphQL endpoints provide a playground UI for exploring the schema

Key Schema Locations

In the schema, look for:

  • Mutations: All import_* mutations in the Mutation type

  • Input types: Types prefixed with import_ (e.g., import_wikipage_input)

  • Response types: Types ending with _response (e.g., import_wikipage_response)

  • Enums: import_status_enum, import_report_problem_severity_enum

Example: Finding Mutation Details

To find details about import_wikipage:

  1. Look for import_wikipage in the Mutation type

  2. Note the input type (import_wikipage_input)

  3. Find the definition of import_wikipage_input to see all required and optional fields

  4. Find the response type (import_wikipage_response) to see what data is returned

Best Practices

Token Security and Handling

  • Validate token format before use

  • Mask tokens in logs (show only first/last few characters)

  • Never expose tokens in error messages

  • Store tokens securely (encrypted if persisted)

Error Handling and Reporting

  • Always check for GraphQL errors in responses

  • Report problems using import_report_problem for user visibility

  • Distinguish between transient and permanent errors

  • Continue importing even when some objects fail

  • Provide clear error messages with context

Performance Considerations

  • Batch operations: Group related operations when possible

  • Concurrency: Use parallel requests for independent operations

  • Rate limiting: Respect server capacity and rate limits

  • Caching: Cache user IDs and other frequently accessed data

  • Incremental imports: Support resuming interrupted imports

Testing Strategies

  • Test with small datasets first: Verify your importer works before large imports

  • Test error handling: Verify behavior with invalid data, network failures

  • Test idempotency: Verify retries don't create duplicates

  • Test edge cases: Empty content, missing users, circular references

  • Validate output: Verify imported content in Phonemos UI

Common Pitfalls to Avoid

  • Missing user ensures: Always ensure users before importing content that references them

  • Wrong revision order: Import revisions before pages that reference them

  • Missing upload complete: Always call import_upload_complete when done

  • Not reporting problems: Use import_report_problem to help users understand issues

  • Ignoring errors: Handle GraphQL errors appropriately

  • Token exposure: Never log or expose full tokens

  • Race conditions: Be careful with concurrent operations that depend on each other

Conclusion

Building a custom importer for Phonemos involves:

  1. Obtaining an import token from the Phonemos UI

  2. Starting the import with import_start

  3. Ensuring users exist before importing content

  4. Importing revisions, pages, files, and tasks in the correct order

  5. Reporting problems as they occur

  6. Completing the upload with import_upload_complete

The GraphQL schema provides all the details needed for implementation. Focus on understanding the data model, respecting dependencies between operations, and handling errors gracefully.

For specific API details, always refer to the GraphQL schema provided with your Phonemos instance.