External Integrations Architecture & Operations

1. Introduction

In line with other pipeline architecture, we have developed a set of pipelines to handle integrations with external third-party API providers. These pipelines provide a consistent, secure, and operationally robust mechanism for invoking external services such as credit reference agencies, while ensuring that consuming systems do not need to handle provider-specific authentication, API behaviour, retries, or error handling.

The integrations are implemented as Ruby-based pipelines that can be executed in two primary ways:

Command-line execution, supporting operational use cases, testing, and controlled re-runs
HTTP-triggered execution, enabling invocation from upstream applications and user interfaces via Power Automate for example

At present, the following external integration pipelines are in scope:

Experian Pipeline – supporting company and individual credit-related API calls
Creditsafe Pipeline – supporting company, director, and consumer credit-related API calls

Each pipeline follows a shared architectural pattern covering:

explicit user intent
authentication and token management
execution control and retry handling
structured logging and auditability
secure persistence of responses where required

This document describes the architecture, processing model, security controls (such as authentication, authorisation, and data protection), and operational considerations for these external integrations. It is intentionally focused on the integration layer and does not attempt to fully document downstream processes unless they are directly impacted by the integration process.

2. Scope & Objectives

2.1 In Scope

Architecture of the external integration pipelines
Interaction patterns with Experian and Creditsafe
Error handling, retries, and reprocessing controls
Security, audit, and compliance considerations
Operational run-book and support model

2.2 Out of Scope

Business intelligence and reporting models
Detailed UI implementation (covered in separate documentation)
Integration with Alph4 APIs (currently in planning stage)

3. Architecture

3.1 Detailed Architecture Overview

The external integration pipelines provide a dedicated integration layer between Leasepath and third-party API providers. This layer is responsible for executing external API calls in a controlled, secure, and observable manner, while shielding upstream systems from provider-specific concerns.

The integration layer is responsible for:

authentication and token management
execution control (timeouts, retries, backoff)
provider-specific request and response handling
structured logging and auditability
persistence of external responses for downstream consumption

Upstream systems interact with the integration layer via an HTTP interface or controlled command-line execution. They do not interact directly with third-party providers.

flowchart LR
    A[Upstream System / UI] --> B[HTTP Layer]
    B --> C[Integration Pipeline]
    C --> D[External API Provider]
    C --> E[Logging & Monitoring]
    C --> F[Persistent Storage Dynamics / Data Lake]

3.2 Execution Environment & Deployment Model

The integration pipelines are implemented as Ruby services and are deployed on Windows Server. They are designed to support asynchronous invocation via an HTTP interface, where incoming requests initiate a pipeline execution and return immediately, with results persisted for later consumption. Pipelines may also be executed directly via the command line for operational and support purposes.

Key characteristics include:

environment-specific configuration supplied via environment variables
no hard-coded credentials or endpoints
consistent execution behaviour regardless of invocation method

3.3 Authentication & Provider Isolation

Authentication with third-party providers is handled within the integration layer using provider-specific token mechanisms. Token acquisition, caching, and refresh logic is encapsulated within the pipelines and is not exposed to upstream callers.

This ensures that:

consuming systems are not coupled to provider authentication models
credentials are centrally managed and rotated
provider-specific behaviour is isolated to the relevant pipeline

3.4 Observability, Audit & Persistence

All pipeline executions are fully observable and auditable. Each execution is associated with correlation identifiers that link:

the originating request
external API calls
logs and metrics
persisted response artefacts

External API responses are persisted to Dynamics tables (and, where appropriate, additional storage eg Sharepoint).

3.5 Architectural Constraints & Assumptions

The integration architecture operates under the following explicit constraints:

User-initiated execution only Credit-impacting API calls are only executed as a result of explicit user actions. Automated or scheduled credit checks are not permitted.
Provider dependency The integration layer depends on the availability and contractual stability of third-party provider APIs.
Fail-fast behaviour Invalid input, authentication failures, or unrecoverable errors result in immediate failure rather than silent degradation.
Incremental evolution The architecture is designed to evolve incrementally, with improvements to idempotency, observability, and audit integration introduced without changing upstream contracts.

4. External Integration Execution Model

4.1 Pipeline Execution

Each pipeline follows a consistent execution lifecycle:

Input validation and intent verification
Authentication token acquisition
External API request execution
Response validation and normalisation
Persistence (where applicable)
Structured logging

sequenceDiagram
    participant Caller
    participant Pipeline
    participant Provider
    participant Log

    Caller->>Pipeline: Execute request
    Pipeline->>Provider: API call
    Provider-->>Pipeline: Response
    Pipeline->>Log: Write audit & metrics
    Pipeline-->>Caller: Result / Error

The caller does not wait for the external API call to complete; execution continues asynchronously after the initial request is accepted.

4.2 Request, Response & Execution Model

The external integration pipelines expose a well-defined execution model for invoking third-party APIs. Each pipeline defines the inputs it accepts, the external services it interacts with, and the controls applied during execution.

Each integration is defined in terms of:

Request inputs Business identifiers and parameters required to determine the external API call to be made.
Execution options Runtime controls that govern how the integration behaves, including timeouts, retry limits, backoff strategy, and diagnostic modes.
External API interactions The third-party endpoints invoked and any provider-specific constraints or behaviours.
Response handling How responses are returned, logged, and optionally persisted for audit or reprocessing purposes.
Correlation and traceability Identifiers used to link requests, external API calls, logs, and persisted artefacts across the execution lifecycle.

This model ensures that each external API call is executed in a controlled, observable, and auditable manner, with clear separation between business intent, execution behaviour, and operational concerns.

4.2.1 Request Inputs & Identifiers

Each pipeline requires a minimal set of business identifiers to determine the external API call to be executed. Examples include:

Company registration number
Provider-specific identifiers (e.g. connectId, peopleId)
Explicit action or intent (e.g. search vs credit report)
Correlation ID

These identifiers are validated prior to execution and are included in structured logs to support traceability.

4.2.2 Execution Options (Runtime Controls)

In addition to business identifiers, each pipeline exposes a set of execution options that control how API calls are made. These options are consistent across providers to ensure predictable operational behaviour.

Common execution options include:

Option	Description	Purpose
`http_timeout_seconds`	HTTP open/read timeout for external API calls	Prevents indefinite blocking
`http_max_attempts`	Maximum number of retry attempts	Bounds retry behaviour
`http_backoff_base_seconds`	Base delay for exponential backoff	Controls retry pacing
`http_backoff_max_seconds`	Maximum backoff delay	Prevents excessive wait times
`debug`	Enables logging of raw API responses	Diagnostic use only
`dry_run`	Logs intended API calls without executing them	Safe testing and validation

These options are available whether the pipeline is invoked via the command line or through the HTTP layer, ensuring consistent behaviour across execution contexts.

4.2.3 Provider-Specific Requests

For each provider, the pipeline maps validated inputs and execution options to one or more external API calls.

Examples include:

Experian
- Company search and credit-related endpoints
- Authentication via token-based access
- Strict timeout guidance as per provider recommendations
Creditsafe
- Company, director, and consumer endpoints
- Distinct identifiers for companies and individuals
- Explicit handling of ambiguous search results

Provider-specific request logic is encapsulated within the relevant pipeline to prevent leakage of provider concerns into upstream systems.

4.2.4 Response Artefacts & Persistence

The integration pipelines are not designed to stream full third-party responses synchronously back to callers.

For each successful external API invocation, the response is written to a Dynamics table associated with the originating request. This enables downstream systems and user interfaces to retrieve results asynchronously and ensures a durable audit trail.

The HTTP interface returns an acknowledgement of execution rather than response data.

4.2.5 Correlation, Audit & Traceability

Every pipeline execution is assigned a correlation identifier that is propagated across:

Incoming request
External API calls
Logs and metrics
Persisted artefacts (where applicable)

This enables:

End-to-end traceability
Safe operational investigation
Controlled reprocessing without unintended duplicate actions

4.2.6 Validation & Guardrails

Prior to execution, the pipelines enforce a set of guardrails, including:

Mandatory input validation (e.g. registration number required)
Validation of execution option ranges (e.g. timeouts > 0)
Explicit failure for invalid or incomplete requests

These controls ensure that invalid requests fail early and predictably, before any external API calls are made.

4.3 Error Handling & Reprocessing

Failure scenarios handled include:

Network timeouts
Authentication failures
Provider-side errors
Ambiguous or multiple search results

Key controls:

Bounded retry logic
No automatic retries for credit report actions without safeguards
Operator-driven reprocessing using correlation identifiers
Clear distinction between transient and terminal failures

4.4 Authentication & Token Management

External API providers typically require short-lived access tokens. The integration pipelines centralise token handling so that individual API calls do not need to implement provider-specific token lifecycle logic.

Token Provider Pattern

Each provider pipeline composes a TokenProvider responsible for obtaining access tokens. Token providers follow a common interface:

get() returns a valid access token
tokens are cached in-memory for the lifetime of the running process
refresh occurs automatically when the token is expired (or close to expiry)

This pattern ensures:

consistent authentication behaviour across providers
fewer token endpoint calls (reduced load and reduced failure surface)
separation of concerns (auth logic not duplicated across API calls)

Cached Token Provider

Token caching is implemented using a shared component:

The token is stored in-memory (cached_token)
The expiry time is tracked as an epoch timestamp (cached_expiry_epoch)
If a token is still valid, get() returns it immediately
Otherwise, the provider-specific fetch function is invoked to obtain a new token

A small safety buffer is applied to avoid using tokens that are close to expiry:

cached_expiry_epoch = now + expires_in - 60

This “refresh 60 seconds early” behaviour helps reduce race conditions where a token expires mid-request.

If token acquisition fails, the error is:

logged
re-raised to ensure the pipeline fails fast and predictably

Provider-Specific Token Acquisition (Example: Creditsafe)

Creditsafe authentication is implemented via a provider-specific TokenProvider that:

reads the token endpoint and credentials from environment configuration
performs a POST to the provider token endpoint
extracts the token from the JSON response
delegates caching/refresh behaviour to the shared cached token provider

This keeps Creditsafe-specific details isolated while still using the common caching behaviour.

Configuration inputs (Creditsafe)

CREDITSAFE_TOKEN_URL
CREDITSAFE_USERNAME
CREDITSAFE_PASSWORD

Notes and Considerations

In-memory scope

Token caching is per-process. Tokens are not shared across multiple running pipeline processes or servers.

Expiry handling

The cached token provider expects token metadata including expires_in (seconds) to calculate expiry.
Where a provider does not supply expires_in, a default or conservative expiry strategy should be defined to prevent tokens being treated as valid indefinitely.

Logging

Token acquisition failures are logged with error class and message.
Tokens and credentials must never be logged.

5. Security, Controls & Compliance

5.1 Access & Role Model

Separation between:
- users initiating requests
- services executing integrations

5.2 Data Protection

All data in transit encrypted using TLS
Data at rest encrypted in:
- Azure Data Lake
- SQL databases (where applicable)
Sensitive fields handled as PII, no sensitive data persisted beyond that which is required for operational use (eg pdf credit reports stored in sharepoint, API response data stored in Dynamics)

5.3 Audit & Logging

Every request assigned a correlation identifier
Logs capture:
- request metadata
- provider response status
- execution timing

6. Technology Stack & Environments

6.1 Technology Inventory

Ruby (integration pipelines)
Windows Server hosting
External APIs (Experian, Creditsafe)
Azure Data Lake
SQL Database
Centralised logging and alerting

6.2 Integration Configuration

External integrations are configured entirely via environment variables. This ensures that sensitive information is not embedded in code and that configuration can vary cleanly between environments (e.g. sandbox vs production).

The following configuration categories are relevant to the Experian and Creditsafe pipelines.

Provider Endpoints

Each provider exposes one or more base URLs that define the external API surfaces used by the integration pipelines.

Experian

EXPERIAN_API_BASE_URL Base URL for Experian API requests.
EXPERIAN_TOKEN_URL OAuth token endpoint used for authentication.

Creditsafe

CREDITSAFE_API_BASE_URL Base URL for Creditsafe Connect API requests.
CREDITSAFE_TOKEN_URL Authentication endpoint used to obtain access tokens.

These values are environment-specific and differ between sandbox and production deployments.

Authentication Credentials

Provider credentials are supplied via environment variables and are read at runtime by the relevant pipeline.

Experian

EXPERIAN_USERNAME
EXPERIAN_PASSWORD
EXPERIAN_CLIENT_ID
EXPERIAN_CLIENT_SECRET

Creditsafe

CREDITSAFE_USERNAME
CREDITSAFE_PASSWORD
CREDITSAFE_CLIENT_ID
CREDITSAFE_CLIENT_SECRET

Credentials are never logged and are only used within the integration layer for token acquisition and API authentication.

Timeout Configuration

All outbound HTTP calls to third-party providers are subject to a configurable open/read timeout.

Default timeout: 30 seconds
Controlled via runtime execution options
Applies to both authentication and API request calls

Timeouts ensure that external provider latency does not cause indefinite blocking within the integration service.

Retry & Backoff Configuration

Integration pipelines apply bounded retry logic for transient failures (for example, network issues or temporary provider unavailability).

Default maximum retry attempts: 4
Backoff strategy: exponential backoff with a bounded maximum delay
Retries are applied only where safe and appropriate

Credit-impacting actions (such as credit report retrieval) are explicitly guarded to prevent unintended repeated calls.

Feature Flags & Execution Modes

The integration pipelines support a small number of execution modes controlled via runtime options rather than environment configuration.

Examples include:

Dry-run mode, which logs intended external API calls without executing them
Debug mode, which enables additional diagnostic logging under controlled conditions

These modes are intended for testing, troubleshooting, and operational support and are not enabled by default.

Configuration Management Principles

All configuration is externalised via environment variables
No provider endpoints or credentials are hard-coded
Defaults are conservative and aligned with provider guidance
Configuration changes do not require code changes