Back to Engineering Log
May 05, 202612 min read

Rejourney Swift Package Is Now in Open Beta

The session state machine, two start paths, URLProtocol swizzle, visual capture backpressure, ANR ping-pong, and crash recovery checkpoints — how the native iOS SDK actually works.

M

The native Rejourney Swift Package is now in open beta. This article covers how the recorder actually works: the session state machine, the two start paths, how we intercept network traffic without intercepting our own uploads, what happens to a session that dies mid-recording, and why the ANR sentinel lives on a separate thread.

The package targets iOS 15.1+, requires Swift tools 5.9, and links only libz. There is no CocoaPods podspec, no JavaScript runtime, and no React Native dependency.

01 // SESSION STATE MACHINE

Five States, One Controller

RejourneyNativeController is a @MainActor singleton that owns all session transitions. Its state is a Swift enum with five cases:

private enum SessionState: Equatable {
    case idle
    case starting(sessionId: String)
    case active(sessionId: String)
    case paused(sessionId: String, backgroundedAt: TimeInterval)
    case terminated
}

The starting case uses a "pending_\(timestampMs)" placeholder ID. A 5-second poll loop (50 iterations × 100 ms) waits for ReplayOrchestrator.shared.replayId to become non-nil before transitioning toactive. If the orchestrator never produces an ID — usually because credential fetch failed — the controller drops back to idle and disables URL interception.

Background/foreground is handled by two NotificationCenter observers wired insetupLifecycleListeners(). When the app backgrounds, state moves to paused(sessionId:, backgroundedAt:) with the current Unix timestamp. On foreground the controller reads the elapsed duration and compares it against a 60-second timeout. Under the threshold the session resumes; over it, the controller races two triggers — a 2-second DispatchWorkItem grace timer and the endReplayWithReason("background_timeout") completion callback — to start a fresh session without blocking on the prior session's upload.

Session rollover (simplified)
var restartStarted = false
let triggerRestart: (String) -> Void = { source in
    guard !restartStarted else { return }
    restartStarted = true
    Task { @MainActor in await self.startNewSessionAfterTimeout() }
}
// Grace path: fire after 2s if callback hasn't arrived yet
DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
    triggerRestart("grace_timeout")
}
// Callback path: fire as soon as old session is finalized
DispatchQueue.global(qos: .utility).async {
    ReplayOrchestrator.shared.endReplayWithReason("background_timeout") { _, _ in
        triggerRestart("end_replay_callback")
    }
}
02 // START PATHS

Fast Restart vs. Full Credential Fetch

Every call to Rejourney.start() first hits /api/sdk/config to fetch remote configuration — sampleRate, recordingEnabled, maxRecordingMinutes, and billing state. The response determines whether visual capture runs at all. A 401/403/404 is treated as hard denial and returns RejourneyStartResult(success: false, error: "access_denied_\(statusCode)"). A network failure falls back to RejourneyRemoteConfig.defaultConfig and continues with local defaults.

After remote config is resolved, the orchestrator needs upload credentials. There are two code paths:

Cold start — beginReplay

Calls DeviceRegistrar.shared.obtainCredential, which performs a credential handshake and stores the result in Keychain. Only then does it start an NWPathMonitor and wait for a satisfied network path before _beginRecording is called.

Warm restart — beginReplayFast

Uses a cached Keychain credential directly. Skips the credential fetch and the network monitor startup entirely. Calls _beginRecording on the main queue synchronously — measurably faster for the session-rollover case after a background timeout.

Sample rate enforcement happens in RejourneySessionPolicy.derive. It draws a Double.random(in: 0..<100) and compares it against the remote sampleRate integer. Sessions that are sampled out return before replay, network interception, or capture is started, so the SDK avoids the native recording path for that launch.

03 // NETWORK INTERCEPTION

URLProtocol Registration and the Swizzle Problem

URLProtocol.registerClass(RejourneyURLProtocol.self) covers URLSession.shared and any session created from the default configuration. It does not cover sessions built with a custom URLSessionConfiguration — which is exactly what SDWebImage, Alamofire, and most third-party SDKs use. To reach those, we swizzle the protocolClasses getter on URLSessionConfiguration itself.

Swizzle — add method onto URLSessionConfiguration, then exchange
let didAdd = class_addMethod(
    URLSessionConfiguration.self,
    swizzledSel,
    method_getImplementation(swizzledMethod),
    method_getTypeEncoding(swizzledMethod)
)
if didAdd, let addedMethod = class_getInstanceMethod(configClass, swizzledSel) {
    originalProtocolClassesIMP = method_getImplementation(originalMethod)
    method_exchangeImplementations(originalMethod, addedMethod)
}

The replacement getter calls through to the original IMP via a saved function pointer, then inserts RejourneyURLProtocol at index 0 if not already present. This means every URLSessionConfiguration instance — existing or future — gets the protocol injected at the point it queries its class list.

Self-interception is prevented by stamping forwarded requests with a property under the key "co.rejourney.handled". canInit(with:) returns false immediately if that property is set. The forwarding session itself is initialized from URLSessionConfiguration.ephemeral with protocolClasses = [], so even the swizzled getter produces an empty list for our internal session.

The original implementation created a new URLSession per intercepted request, which leaked 1–3 MB per request under heavy traffic. The current design uses one shared forwarding session with a SessionDelegateAdapter that routes callbacks through an NSMapTable<URLSessionTask, RejourneyURLProtocol>.strongToWeakObjects(). The weak value side means protocol instances that are stopped by the URL loading system get collected without a leak, and the map never accumulates stale entries.

04 // VISUAL CAPTURE

Main-Thread Reads, Background Encodes, Backpressure Limits

UIKit requires that drawHierarchy(in:afterScreenUpdates:) runs on the main thread. There is no way around this. What we can control is how much time we spend there and how we handle encode and upload without blocking the render pipeline.

Screenshots are taken at a configurable interval (default 0.33s, translating to roughly 3 fps) and immediately handed off to a serial OperationQueue named "co.rejourney.encode" with .utility QoS. JPEG compression runs entirely on that queue. The main thread is only involved for the initial pixel read — not for compression, buffering, or upload.

Two backpressure limits protect against queue growth under slow network conditions: 50 pending encode batches and 500 buffered screenshots. Frames are dropped — not queued indefinitely — when either limit is reached. The capture scale is 1.25, which means the framebuffer is read at 80% of linear screen size before JPEG encoding, matching the ratio used by the Android recorder.

One non-obvious guard: we skip drawHierarchy while the keyboard is animating. Calling it during a keyboard transition causes UIKit to stall the main thread — we measured 7+ seconds — while it resolves conflicting layout constraints between the keyboard window and the app window. We observe both keyboardWillShow and keyboardWillHide, and only resume capture 0.45 seconds after keyboardDidShow or keyboardDidHide fires.

View hierarchy snapshots run on a separate Timer scheduled in the default run loop mode — intentionally not .common. This lets the timer pause during scrolling, preventing main-thread pressure from a hierarchy walk through deep subviews while the user is actively scrolling. Hierarchy snapshots are also skipped when MapKit is visible and actively animating; the Metal and OpenGL subview tree under an animating map adds meaningful main-thread cost to a full hierarchy scan. Deduplication uses a cheap hash of the current screen name and root child count — if neither changes, the snapshot is not uploaded.

05 // ANR DETECTION

A Ping-Pong Sentinel on a Dedicated Thread

AnrSentinel runs a watch loop on a dedicated Thread named "co.rejourney.anr" at .utility QoS. Every 2 seconds it posts a block to DispatchQueue.main.async and records the dispatch time. When the main queue actually executes the block it stamps ProcessInfo.processInfo.systemUptime as the response time. If 2 seconds pass without a response and the delta exceeds the 5-second freeze threshold, the sentinel declares an ANR.

State shared between the watch thread and the main thread is protected by os_unfair_lock, which is appropriate here because the critical sections are short (a handful of struct assignments) and the lock is never held across I/O. A lastAnrReport timestamp prevents duplicate reports while a single long freeze persists — if the freeze hasn't cleared for another 5-second window, the sentinel stays quiet.

On ANR detection, Thread.callStackSymbols is captured and the incident is handed to StabilityMonitor, which persists it to a JSON file in the caches directory. This mirrors how crash reports survive process termination: if the app is killed while an ANR is in progress, the next session start will find and upload the stored incident.

06 // CRASH RECOVERY

Checkpoints, Recovery Files, and the Close Anchor

When a session starts, the orchestrator writes a checkpoint to rejourney_recovery.json in the app's Documents directory. The file contains the session ID, start timestamp, API token, endpoint, upload credential, and a timingVersion field (currently 3). A background DispatchSourceTimer fires every 5 seconds on a .utility queue to update lastActiveCheckpointMs and re-write the file. The timer does not fire while the app is backgrounded, so the file always reflects the last known foreground timestamp.

On the next app launch, recoverInterruptedReplay reads the file, re-hydrates SegmentDispatcher with the stored credentials, and calls VisualCapture.shared.uploadPendingFrames for any frames that were buffered to disk but not uploaded. Only after those frames are confirmed uploaded does it call SegmentDispatcher.concludeReplay with endReason: "recovery_finalize".

The closeAnchorAtMs parameter in the finalize call is where timingVersion matters. Version 3 semantics: for a "background_timeout" end reason, the close anchor is set to lastBackgroundEntryMs — the exact moment the app last entered the background — rather than the crash recovery time. This keeps the session duration accurate in the replay timeline even when the finalize call happens minutes or hours later.

07 // PUBLIC API

The Core Public Surface

The Rejourney enum is @MainActor and exposes both async and callback-based overloads for start() and stop() so UIKit apps without Swift concurrency adoption can still call it from an AppDelegate.

// Configure — call before start, safe to call multiple times
Rejourney.configure(publicKey: "rj_...", options: .init(
    wifiOnly: false,
    captureANR: true,
    autoTrackNetwork: true
))

// Async start — returns RejourneyStartResult with sessionId + telemetryOnly flag
let result = await Rejourney.start()

// Identity — persisted to UserDefaults, restored across sessions
Rejourney.identify("user_abc123")

// Screen tracking — queued before session is ready, replayed on active
Rejourney.trackScreen("Checkout")

// Custom events — typed properties accept Swift literals directly
Rejourney.logEvent("checkout_started", properties: ["plan": "pro"])

// View-level redaction — registered in VisualCapture's RedactionMask
Rejourney.mask(sensitiveLabel)

// Graceful stop — drains and finalizes the session
let stopResult = await Rejourney.stop()

Custom event properties use RejourneyMetadataValue, an indirect enum with ExpressibleByStringLiteral, ExpressibleByIntegerLiteral, ExpressibleByFloatLiteral, ExpressibleByBooleanLiteral, and ExpressibleByNilLiteral conformances. You can pass a string, int, double, bool, array, nested object, or nil literal directly without wrapping.

Screen names tracked before start() returns are queued in RejourneySessionContext (capped at 50 entries, consecutive duplicates removed). When the session becomes active, the queue is drained and each screen is replayed as a telemetry view transition event, so pre-start navigation appears correctly in the replay timeline.

08 // RELEASE MODEL

What Is Beta and What Is Not

The recorder, the ingest protocol, the session lifecycle semantics, and the privacy defaults are production-quality — they have been exercised through the React Native SDK at scale. What we are collecting signal on in this beta:

  • SwiftPM resolution behavior across real Xcode versions and enterprise CI caches.
  • App extension edge cases — the shared UserDefaults and Keychain access groups behave differently under extension sandboxing.
  • SwiftUI navigation patterns: since SwiftUI has no UIKit viewDidAppear equivalent, we want to understand how teams prefer to wire trackScreen.onAppear, NavigationStack path observation, or a custom modifier.
  • Whether the PrivacyInfo.xcprivacy manifest is being picked up correctly by App Store submission pipelines.

Native iOS versioning is independent from the React Native package. Tags follow plain semver (v0.2.0). A CI check validates that packages/ios/VERSION and RejourneySDKInfo.version are in sync before a tag is created.