DigiFrens Docs

DigiFrens Technical Documentation

Version 3.1 | April 2026


1. Executive Summary

DigiFrens is a sophisticated iOS AI companion application that combines animated avatars, intelligent memory systems, and natural voice interaction to create meaningful digital relationships. Unlike conventional AI assistants optimized for productivity, DigiFrens is designed around emotional connection - companions that remember, evolve, and respond to users as individuals.

The application features a triple avatar engine (3D VRM, 2D Live2D, and photorealistic Gaussian Splatting), a five-phase emotional memory system with cognitive graph integration, six AI providers including free on-device Apple Intelligence, and premium voice synthesis. All conversation data is stored locally on the user's device, with no cloud dependency for core functionality.

Key Differentiators:

  • Privacy-first architecture - conversations never leave the device
  • Living User Model (LUM) - a cognitive graph that models the user's beliefs, values, and goals
  • Cognitive Memory Pipeline - spreading activation retrieval inspired by human associative memory
  • On-device avatar creation - single selfie to photorealistic animated avatar via CoreML
  • Adaptive conversation intelligence - 11-state mental process that responds to emotional context

2. Product Overview

What is DigiFrens?

DigiFrens creates AI entities with visual presence, unique voices, persistent memories, and consistent personalities. Each companion provides judgment-free interaction and emotional support through natural conversation.

Core Experience

Users interact with animated avatar companions through text or voice. Each conversation is enriched by:

  • Visual presence - avatars express emotions in real-time through facial expressions, gestures, and idle animations
  • Voice interaction - premium neural voice synthesis with synchronized lip movements
  • Persistent memory - companions remember personal details, past conversations, and emotional patterns
  • Personality evolution - avatar traits shift based on conversation dynamics over time
  • Proactive intelligence - companions initiate follow-ups, check-ins, and celebrations based on life events

User Flow

  1. Launch - automatic device-based authentication (no sign-up required)
  2. Landing page - horizontal carousel of available avatars with recent conversation previews
  3. Avatar selection - tap to start or resume a conversation
  4. Conversation - text or voice input with real-time avatar responses
  5. Settings - configure AI provider, voice, subscription, and security

Available Companions

NameEnginePersonality
HaruVRM (3D)Cool, introspective, main character energy
EmiVRM (3D)Kind, cheerful, playful, optimistic
HiyoriLive2D (2D)Cheerful, energetic, bubbly, studious
MaoLive2D (2D)Mischievous, playful, witty, curious
CustomGaussian SplatUser-created from selfie, personality selectable

3. Technical Architecture

Platform

  • iOS 26.0+ (iPhone 11 or newer)
  • Swift 6.0+ / Xcode 16.0+
  • iPhone 15 Pro+ for Apple Intelligence features
  • ~500MB storage (app + models + CoreML cache)

Architecture

The codebase follows MVVM architecture with 14 service domains covering AI, memory, emotion, voice, calendar, avatar management, and more. Three dedicated engine modules handle avatar rendering (VRM, Live2D, Gaussian Splatting), each conforming to a shared protocol for unified expression control.

Key Technologies

TechnologyPurpose
SwiftUIDeclarative UI with @Observable macro
SceneKitVRM 3D rendering (Metal backend)
Live2D Cubism SDK2D avatar animation (Metal renderer)
MetalSplatterGPU-accelerated Gaussian splat rendering (Metal)
CoreMLOn-device embeddings (GTE-Small) + avatar reconstruction (LAM + FLAME)
Foundation ModelsApple Intelligence on-device AI (iOS 26+)
SQLiteLocal memory and LUM graph persistence
StoreKit 2Subscription management
EventKitCalendar integration
VisionFace detection for avatar capture
AVFoundationAudio recording, camera, speech

Conversation Flow

CONVERSATION FLOW • MESSAGE TO RESPONSE
USER INPUTUser sends messageCONVERSATION VIEW MODELStore memoryreturns UUIDAnalyze emotionuser stateExtract LUM databeliefs, values, goalsCreate edgesmemory → LUMCONTEXT BUILDER • PARALLEL ASSEMBLYLUM Contextbeliefs, mood, chapterMental ProcessOpenSouls statesMemory Retrieval9 strategiesCalendarupcoming eventsPersonalityHEXACO traitsRelevant memories • Emotional history • Shared language • Avatar blueprintAI SERVICESystem prompt + contextAI Providerresponse + emotionApple IntelOpenAIClaudeLocal LEAPOpenRouterOpenClawAVATAR RESPONSEVoice synthesisElevenLabs / systemLip syncviseme synchronizationEmotion expressionVRM / Live2D / Gaussian

Data Layer

StoreContentsSecurity
SQLiteMemories, LUM graph, emotions, sessions, personality evolutionDevice-local
KeychainAPI keys, device ID, passkey credentialsHardware-encrypted
UserDefaultsPreferences, settingsStandard

4. Avatar System

DigiFrens features a triple-engine avatar system, each optimized for different use cases.

Engine Architecture

AVATAR ENGINE ARCHITECTURE
AvatarEngineProtocolVRM EngineSceneKit + MetalLive2D EngineCubism SDK + MetalGaussian SplatMetalSplatterHaru, EmiHiyori, MaoCustom (selfie)~35MB~32MB~1.2GB model

All three engines conform to AvatarEngineProtocol, providing a unified interface for expression control, lip sync, and idle animation.

VRM Engine (3D)

Avatars: Haru, Emi (~35MB total)

The VRM engine renders 3D avatar models using SceneKit with a Metal backend. It provides:

  • 60 FPS unified animation loop with layered blending
  • 7 emotion expressions (happy, sad, angry, surprised, excited, confused, neutral)
  • 5 viseme shapes for lip synchronization
  • Idle animations - breathing, blinking, head movement, body sway
  • Physics simulation - hair and clothing dynamics
  • NSCache with 100MB byte limit and automatic memory pressure handling

Live2D Engine (2D)

Avatars: Hiyori, Mao (~32MB)

The Live2D engine uses the Cubism SDK with a Metal renderer (migrated from OpenGL ES for iOS 26 compatibility). It features:

  • Emotion-driven facial animations with parameter mapping
  • Swift-to-C++ bridge via Objective-C++ (Live2DBridge.mm)
  • Metal rendering via Live2DMetalView

Gaussian Splatting Engine (Photorealistic Custom Avatars)

Status: In development (core pipeline complete)

The Gaussian Splatting engine enables users to create photorealistic animated avatars from a single selfie photo. The pipeline combines three key technologies: FLAME (a statistical 3D morphable face model) provides structured facial geometry and an animation rig, LAM (Large Avatar Model) is a feed-forward transformer that predicts animatable 3D Gaussians from a single image, and a Metal renderer rasterizes the Gaussians at real-time frame rates on Apple hardware.

How It Works: Single Photo to Animatable 3D Avatar

The reconstruction pipeline runs as a single forward pass through the LAM network — no iterative optimization, no multi-view capture, no cloud processing.

Step 1 — Feature Extraction. A frozen DINOv2 vision backbone extracts multi-scale image features from the input photo. Shallow layers capture local detail (skin texture, hair, wrinkles) while deep layers encode global facial structure.

Step 2 — FLAME-Guided Query Construction. FLAME's 5,023 mesh vertices are subdivided twice to produce 81,424 query points that define a dense, topologically consistent sampling of the face and head. Each point is positionally encoded and projected through an MLP into a learnable feature vector. These structured queries give the transformer a strong geometric prior — it knows where to look on the face rather than predicting structure from scratch.

Step 3 — Transformer Cross-Attention. 10 stacked transformer blocks (16 heads, 1024-dim) perform cross-attention between the FLAME-derived point queries and the image features. Each block refines the point representations, progressively resolving identity, texture, and fine geometric detail.

Step 4 — Gaussian Attribute Prediction. MLP decoder heads predict per-point 3D Gaussian attributes in canonical (neutral expression, forward-facing) space:

AttributeDimensionsPurpose
Color (RGB)3Appearance
Opacity1Transparency / density
Scale1Gaussian extent
RotationSO(3)Gaussian orientation
Position offset3Refinement beyond FLAME template (captures hair, accessories, wrinkles)

Step 5 — Animation. Standard FLAME linear blend skinning (LBS) with corrective blendshapes deforms the canonical Gaussians to any target expression and head pose. No neural network runs at animation time — it is purely the classical FLAME skinning pipeline applied to the predicted Gaussians. Expression (100 blendshape coefficients) and pose (jaw, neck, eyeball joints) parameters drive the deformation:

G_canonical = FLAME_template + shape_blendshapes(β) + predicted_offsets
G_posed = LBS(G_canonical, joints, pose_θ, expression_φ, blend_weights)

Step 6 — Metal Rendering. The posed Gaussians are rasterized via MetalSplatter using Apple's Metal graphics API, achieving 30-60 FPS on-device with GPU-accelerated splatting.

On-Device Reconstruction Pipeline

ON-DEVICE RECONSTRUCTION • SELFIE TO AVATAR
INPUTSingle selfie photo518×518, center-croppedCOREML INFERENCE • 557.6M PARAMSDINOv2 ViT-L/14multi-scale image featuresSD3-style Transformer10-layer decoderGSLayer MLP20,018 Gaussians × 14chLAM: Large Avatar Model (SIGGRAPH 2025) • Apache-2.0 • FP16: 1,214.6 MBEXPORTExport to .spz formatcompressed GaussiansBLENDSHAPE GENERATIONflame_arkit_mapping.json52 ARKit blendshapesFLAME LBS position deltasper-Gaussian deformationsOUTPUTGaussianSplatEngine renders at 60 FPS

Why FLAME as the Geometric Prior

FLAME (Faces Learned with an Articulated Model and Expressions) is a statistical 3D morphable face model trained on 33,000+ 3D scans. It decomposes facial variation into three disentangled components:

  • Shape (300 PCA components) — identity-specific geometry (skull shape, nose size, jaw width)
  • Expression (100 blendshape components) — facial expressions (smile, frown, surprise)
  • Pose (articulated joints) — jaw rotation, neck articulation, eyeball gaze

This factored representation means identity, expression, and pose can be varied independently. By reconstructing all Gaussians into FLAME's canonical coordinate space, the network avoids entangling identity with expression — a user's neutral face is always reconstructed the same way regardless of their expression in the input photo.

FLAME also provides dense correspondence: all reconstructed avatars share the same mesh topology, enabling consistent animation control, texture transfer, and blend weight inheritance across every avatar created in the app.

Why 3D Gaussian Splatting

Unlike mesh-based or NeRF-based representations, 3D Gaussian Splatting offers a unique combination of properties ideal for mobile avatar rendering:

  • Explicit representation — each Gaussian has concrete position, color, opacity, and shape parameters that can be directly manipulated by the FLAME animation rig
  • Real-time rasterization — Metal-accelerated splatting avoids the expensive ray marching required by neural radiance fields
  • Photorealistic quality — Gaussians naturally model soft boundaries (hair, skin pores, translucent regions) that triangle meshes struggle with
  • Compact storage — an avatar's 81,424 Gaussians with 14 channels per point stores efficiently on-device

Performance Comparison

MetricPrevious (Cloud)Current (On-Device)
Input60-second guided video1 selfie photo
Wait time5-15 minutes~1.4 seconds (single forward pass)
Network200MB up, 10MB downModel download only (~1.2GB, one-time)
Cost per avatar$0.50-1.50 (cloud GPU)$0
Works offlineNoYes (after initial download)

Rendering Performance

PlatformReconstructionAnimation + Rendering
A100 GPU~1.4s281 FPS
MacBook M1 Pro120 FPS
iPhone 1635 FPS
iPhone 15 Pro30+ FPS (target)

The Metal renderer uses GPU-accelerated Gaussian splatting via MetalSplatter, with depth sorting, alpha compositing, and view-dependent rendering handled entirely on the GPU.

LAM Model Details

PropertyValue
PaperLAM: Large Avatar Model (SIGGRAPH 2025)
AuthorsHe, Gu, Ye, Xu, Zhao, Dong, Yuan, Dong, Bo
LicenseApache-2.0
Parameters557,642,254 (557.6M)
BackboneDINOv2 (frozen, multi-scale feature extraction)
Decoder10 transformer blocks, 16 heads, 1024-dim
InputSingle image (518x518)
Output81,424 animatable 3D Gaussians x 14 channels
Animation rigFLAME LBS + corrective blendshapes
Geometric priorFLAME mesh (5,023 vertices, 2x subdivided)
Training dataVFHQ (15,204 video clips, ~3M frames)
Model size (FP16)1,214.6 MB
CoreML conversionFP16 quantized for on-device inference

Animation Pipeline

Gaussian avatars are animated through two complementary systems:

FLAME LBS (primary): The FLAME rig drives head pose (jaw, neck, eyeball joints) and expression via 100 blendshape coefficients. Each subdivided vertex inherits blend weights from its parent FLAME vertices, so animation is a direct matrix operation with no neural network inference. Audio-driven animation maps speech audio to FLAME expression parameters in real-time.

ARKit Blendshape Fallback: 52 ARKit blendshape weights drive per-Gaussian position deltas via the 3D Gaussian Blendshapes technique:

deformed_position[i] = neutral[i] + sum(weight_j * delta_j[i])

A region-based deformation system operates when precomputed deltas are unavailable (jaw, mouth, eyes, brows).

Custom Avatar Personalities

Users select from 6 built-in AI personalities when creating a custom avatar. Each personality includes a unique character primer that shapes the companion's conversational style and responses.


5. Memory System

The DigiFrens memory system is a five-phase architecture that goes far beyond simple conversation history. It models emotional patterns, detects behavioral routines, maintains shared language, and uses multiple retrieval strategies to surface the most relevant memories.

Architecture Overview

MEMORY SYSTEM ARCHITECTURE • 5-PHASE
STORE PATHUser messageStore memoryreturns UUIDEmbeddingsGTE-SmallSQLiteFloat16 + zlibLUM ExtractorsBelief / Value / Goal EdgesQUERY PATHQuery9 Retrieval Strategiessemantic, emotional, spreading...LRU Cache20MB limitRanked ResultsMAINTENANCEConsolidation (session end)Pruning (hourly)Cache <1msDB 20-50msSemantic 100-300msTotal 150-400ms

Five Phases

Phase 1: Emotional Timeline

Mood tracking with per-emotion baselines and anomaly detection. Each conversation turn records the user's emotional state, building a timeline that reveals patterns over days, weeks, and months.

Phase 2: Proactive Intelligence

Automated follow-ups, check-ins, celebrations, and crisis detection based on life events and emotional patterns. The system proactively surfaces relevant context without being asked.

Phase 3: Pattern Detection

Detection of behavioral routines (day/time patterns), emotional triggers (90-day timeline analysis), and coping strategies (mood recovery sequences).

Phase 4: Shared Language

Inside jokes, communication quirks, and shared experiences accumulate over time, giving each companion relationship a unique vocabulary and history.

Phase 5: Context Windows

Nine retrieval strategies ensure the most relevant memories surface for each conversation:

  1. Semantic - embedding similarity
  2. Topical - tag and category matching
  3. Emotional - mood-aligned retrieval
  4. Temporal - recent and time-relevant
  5. Recency - most recent interactions
  6. Importance - high-importance memories first
  7. Social - relationship-relevant memories
  8. Associative - linked memory chains
  9. Spreading Activation - graph-based BFS traversal (see Cognitive Memory Pipeline)

Storage & Performance

MetricValue
Storage formatSQLite with Float16 + zlib compression
Compression ratio~70% storage reduction
CacheThread-safe LRU, 20MB limit
Cache hit latency<1ms
DB query latency20-50ms
Semantic search (500 memories)100-300ms
Total retrieval150-400ms

6. Living User Model (LUM)

The Living User Model is a cognitive graph that models the user's mental landscape. It tracks what the user believes, values, and aspires to, creating a rich understanding that goes beyond surface-level conversation.

Graph Structure

Node Types

  • Beliefs - statements the user holds true (with confidence and valence scores)
  • Values - principles the user considers important (with categories)
  • Goals - objectives the user is working toward (with progress and status)
  • Emotional Triggers - situations that reliably produce emotional responses
  • Narrative Themes - recurring life narrative patterns
  • Emergent Types - new node types automatically discovered from conversation patterns

Edge Types

Edge TypeDescription
supportsOne node reinforces another
contradictsNodes are in tension
triggersOne node activates another
motivatesOne node drives another
leadsToNarrative arc connection
coEntityShared named entities
coSessionSame conversation session
temporalWithin 24-hour window
coTopicShared tags/topics
semanticSimilarEmbedding similarity > 0.65

Key Features

Emergent Schema Learning

The system automatically detects new node and edge types from conversation patterns. When users discuss concepts that don't fit existing categories, the LUM proposes new schema elements through EmergentPredicateDetector.

Life Chapters

Automatic narrative arc detection segments the user's experience into chapters (e.g., "job transition," "new relationship," "health focus"). Each chapter provides context for how current conversations relate to broader life patterns.

Mood Trajectory

Real-time classification of emotional direction:

  • Improving - positive trend
  • Declining - negative trend (triggers empathic support)
  • Stable - consistent emotional state
  • Volatile - rapid emotional shifts (triggers active listening)

Traversal Decay & Reinforcement

  • Unused connections decay with a 30-day half-life
  • Traversed edges are reinforced proportional to usage
  • Ensures the graph stays current and relevant

Context Integration

LUM insights feed directly into AI prompts through LUMContext:

LUM CONTEXT • AI PROMPT INTEGRATION
LUMContextbeliefswith confidencevaluesrankedgoalswith progressmoodTrajectorydirectionidentitysummarychapteractiveInjected into AI system promptFeeds beliefs, values, goals, mood trajectory, and life context into every AI response

7. Cognitive Memory Pipeline

The Cognitive Memory Pipeline brings memories into the LUM cognitive graph as first-class nodes, connecting them via typed edges. It is inspired by cognitive science research on associative memory and a 6Rs processing pipeline (Record, Reduce, Reflect, Reweave, Verify, Rethink).

Three Capabilities

Memory Reweaving

Retroactively enriches older memories when new information arrives. Operates in three tiers:

TierTriggerLatencyScope
Tier 1: InlineEach new memory stored~50msEntity overlap detection, tag updates, importance boost
Tier 2: Session-endConversation endsSecondsSemantic similarity edges, narrative continuation, emotional reinterpretation
Tier 3: Deep scanDaily maintenanceMinutesFull graph analysis, cross-session patterns

Example: A user mentions "interview next week" in one conversation, then says "I got the job!" two weeks later. Reweaving links these memories via a .leadsTo narrative arc edge.

Knowledge Quality Pipeline (Verify/Rethink)

Systematic quality checks on stored knowledge:

  • Contradiction detection - flags beliefs that conflict with newer information
  • Staleness detection - identifies outdated information
  • Confidence decay - reduces certainty on old, unreinforced beliefs
  • Sentiment drift analysis - detects emotional valence changes via UnifiedEmotionAnalyzer
  • Findings routing - quality issues surface through proactive intelligence (max 3 per pass to prevent flooding)

Spreading Activation

The 9th retrieval strategy. Uses breadth-first graph traversal along cognitive edges to discover memories through associative structure rather than just embedding similarity. Activation decays with each hop, and multi-path convergence receives a boost — mirroring how human associative memory works.

Data Flow

COGNITIVE MEMORY PIPELINE • DATA FLOW
New memory storedEDGE CREATIONMemoryEdgeManagerco_entityco_sessiontemporalco_topicTIER 1: INLINEMemoryReweaver~50msEntity overlapTag updatesImportance boostSESSION ENDMemoryReweaverTier 2: session-endSemantic edgesnarrative continuationEmotional reinterpretationDAILY MAINTENANCEMemoryReweaverTier 3: deep scanKnowledgeQualityPipelineverify + rethinkEdge maintenance7-day decay, 50K capContradictionsStalenessConfidenceSentimentEdge cap: 50,000 • Reweave limit: 3 per memory • Entity index: O(1) lookup

8. AI & Intelligence

Multi-Provider Architecture

DigiFrens supports six AI providers through a unified abstraction layer (AIService), giving users flexibility in cost, quality, and privacy.

ProviderModelsCostNotes
Apple Intelligence3B on-device modelFreeNo API key, works offline, iOS 26+ required
OpenAIGPT-4.1 Nano, Mini, GPT-4oUser API keyCloud-based
AnthropicClaude Haiku 4.5, Sonnet 4.5, OpusUser API keyCloud-based
Local LEAPOn-device LEAP SDK modelsDigiFrens+No network required
OpenRouterVarious free and paid modelsUser API keyModel aggregator
OpenClawSelf-hosted modelsWebSocket gatewaySelf-hosted option

Context Building

The ContextBuilder assembles a comprehensive system prompt for every AI request, including:

  • Relevant memories (multi-strategy retrieval)
  • Emotional history and current mood
  • Mental process state and prompt
  • Avatar personality blueprint (HEXACO traits)
  • Calendar context (upcoming events)
  • Shared language (inside jokes, quirks)
  • LUM cognitive context (beliefs, values, goals)
  • Spreading activation results

Context is truncated to a configurable maxContextTokens (default: 2000) to stay within provider limits.

Mental Process (OpenSouls Pattern)

An adaptive conversation state machine integrated with LUM:

11 Mental States

StateWhen Used
Crisis SupportDetected distress or safety concerns
Empathic SupportDeclining mood trajectory
Active ListeningVolatile emotions
Deep ConversationGoal discussion or intellectual topics
Problem SolvingUser seeking practical help
CelebrationMilestones or achievements
Playful BanterLight, fun interactions
StorytellingNarrative or experience sharing
Casual ChatDefault relaxed conversation
ProcessingAbsorbing complex information
TransitioningShifting between modes

LUM-Aware State Selection

The mental process considers LUM data when selecting states:

  • Declining mood - +0.3 weight toward empathic support
  • Volatile emotions - +0.2 weight toward active listening
  • Goal mention - +0.4 weight toward deep conversation
  • Negative self-beliefs - applies gentler response modifiers

Response Modifiers

Each state adjusts response parameters:

  • Verbosity level
  • Question frequency
  • Reflection depth
  • Humor level
  • Formality

Streaming

The AI system supports streaming responses with:

  • Streaming API responses - tokens appear as they're generated
  • Parallel context building - context assembly runs concurrently
  • TTS pipelining - voice synthesis begins before the full response completes

Embeddings

On-device CoreML embeddings via GTE-Small (384-dimensional). Embeddings never leave the device, powering semantic search across memories.


9. Emotion System

Detection Architecture

EMOTION DETECTION • 6-SIGNAL FUSION
User textSIGNAL FUSION (6 SOURCES)SemanticLinguisticSentimentContextualHistoricalExplicitSarcasm detection • Context bias correction • Adaptive learningEmotionalState7 core + 8 complex emotionsOUTPUTComplex Emotion MappingAvatar ExpressionsTimeline StorageUnifiedEmotionAnalyzerAdaptiveEmotionLearnerEmotionalTimelineManagerMemoryGraphV2

Emotion Categories

Core Emotions (7)

happy, sad, angry, surprised, excited, confused, neutral

Complex Emotions (8)

tired, anxious, content, frustrated, grateful, bored, embarrassed, proud

Detection Features

  • 6-signal weighted fusion - semantic analysis, linguistic markers, sentiment scoring, contextual cues, historical patterns, and explicit statements
  • Sarcasm detection - identifies when literal text contradicts intended emotion
  • Context bias correction - adjusts for conversation context
  • Adaptive learning - self-improving system that calibrates to each user's expression patterns

10. Personality Evolution

HEXACO Model

DigiFrens uses the HEXACO personality framework - six core traits on a 0.0 to 1.0 scale:

TraitLow EndHigh End
Honesty-HumilityManipulative, self-importantSincere, modest, fair
EmotionalityStoic, detachedEmpathetic, anxious, sentimental
ExtraversionReserved, quietSocial, energetic, cheerful
AgreeablenessCritical, stubbornForgiving, flexible, patient
ConscientiousnessSpontaneous, disorganizedOrganized, diligent, perfectionist
OpennessPractical, conventionalCreative, curious, unconventional

Per-Avatar Baselines

AvatarHEXACOCharacter
Haru65%40%45%60%70%70%Cool, introspective
Emi70%85%75%90%65%60%Warm, expressive
Hiyori80%55%40%65%90%85%Studious, intellectual
Mao55%60%80%50%45%75%Mischievous, playful

Evolution Mechanics

  • Session updates: ~1% change per trait per session based on conversation metrics (depth, positivity, engagement)
  • Relationship multiplier: Scales from 0.5x (new companion) to 1.5x (soulmate level)
  • Weekly decay: 0.5% per week toward baseline when inactive, ensuring personalities return to character when not reinforced
  • Trait bounds: Always constrained to 0.0-1.0

AI Integration

Personality traits directly influence avatar responses through behavioral prompts injected into the AI system prompt. Higher extraversion produces more talkative responses; higher emotionality produces more empathetic ones.

Visualization

A radar chart (PersonalityRadarChartView) displays current trait levels against baseline values on a hexagonal chart, letting users see how their companion's personality has evolved.


11. Voice System

Architecture

VOICE SYSTEM ARCHITECTURE
speakResponse()ConversationViewModelelevenlabs_*StreamingTTSAudioChunksystem voiceVoiceServiceAVSpeechLIP SYNCHRONIZATION5 viseme shapesPhoneme est.Audio energyPer-engine adapt

Voice Options

Premium Voices (DigiFrens+)

ElevenLabs integration with 30+ neural voices. Features:

  • Streaming TTS with word-by-word captions
  • Per-avatar voice assignment
  • Natural prosody and expressiveness

System Voices (Free)

Built-in AVSpeechSynthesizer voices available on all devices.

Planned: On-Device TTS (Kokoro)

Kokoro TTS (82M params, ~86MB quantized) is planned as a free offline alternative:

FeatureElevenLabsKokoro (Planned)System
QualityExcellentGoodBasic
CostDigiFrens+FreeFree
OfflineNoYesYes
Voices30+11Many
Latency~200ms network~300ms localInstant

Lip Synchronization

Real-time lip sync maps audio/text to viseme shapes on the avatar:

  • 5 viseme categories for VRM avatars
  • Text-based phoneme estimation for immediate sync
  • Audio energy analysis for natural movement timing
  • Per-engine adaptation - VRM morph targets, Live2D parameters, Gaussian position deltas

12. Calendar Integration

Status: Production-ready

Features

  • Read calendar events with natural language queries
  • Create events and reminders from natural language
  • Support for recurring events (daily, weekly, monthly)
  • Advanced time parsing (ranges, relative dates, durations)
  • Proactive 30-minute event reminders
  • Schedule stress analysis and break suggestions
  • Graceful degradation without calendar permissions

Natural Language Parsing

InputInterpretation
"Meeting tomorrow at 9am-3pm"6-hour event, tomorrow
"Remind me to call Mom next Monday"Reminder, next Monday
"Block time this afternoon for 2 hours"2-hour event, today at 2pm
"Cancel my dentist appointment"Event cancellation

13. Proactive Intelligence

The proactive intelligence system enables companions to initiate contextually relevant interactions without being prompted.

Action Types

TypeTriggerExample
Follow-upLife event needs follow-up"How did the interview go?"
Check-in3+ days inactivity"Hey, I haven't heard from you in a while"
CelebrationMilestones reached"Congrats on reaching your reading goal!"
ConcernCrisis or anomaly detectedSupportive outreach during emotional distress
ReminderUser goal or intention"You mentioned wanting to start exercising"
EncouragementUpcoming event support"Good luck with your presentation tomorrow"

Pattern Detection

Pattern TypeAnalysis
Emotional Triggers90-day timeline analysis identifies situations that reliably produce specific emotions
Coping StrategiesTracks mood recovery sequences to understand what helps the user feel better
Behavioral RoutinesDay/time patterns reveal the user's natural rhythms

Data Flow

PROACTIVE INTELLIGENCE • DATA FLOW
ConversationsLIFE CONTEXT TRACKINGLifeContextTrackerLife EventsFollow-upsPATTERN DETECTIONPatternDetectionServiceEmotional TriggersCoping StrategiesRoutines90-day timeline • Mood recovery • Day/time patternsENGINEProactiveIntelligenceEngineProactive ActionsOUTPUTReflectiveMemoryProcessorContext Hints

14. Privacy & Security

Design Principles

DigiFrens follows a local-first, privacy-by-default architecture. All sensitive data stays on the user's device.

Data Storage Security

DataStorageSecurity Level
ConversationsLocal SQLiteDevice-only, never uploaded
MemoriesLocal SQLiteDevice-only
LUM cognitive graphLocal SQLiteDevice-only
Emotional timelineLocal SQLiteDevice-only
API keysKeychainHardware-encrypted, kSecAttrAccessibleWhenUnlockedThisDeviceOnly
Device IDKeychainHardware-encrypted, device-local
Passkey credentialsKeychainHardware-encrypted
User preferencesUserDefaultsStandard
Custom avatar modelsDocuments folderDevice-only

Authentication

  • Automatic device-based accounts - no sign-up required
  • Optional passkey security - WebAuthn protocol with biometric verification
  • No email or password - device ID serves as the user identifier

On-Device Processing

CapabilityImplementation
Text embeddingsCoreML (GTE-Small, 384-dim)
Emotion analysisOn-device NLP + learned models
Avatar reconstructionCoreML (LAM, 557M params, FLAME-guided)
AI responsesApple Intelligence (on-device, optional)

Cloud Interactions

The only cloud interactions are:

  • AI providers (optional) - when using OpenAI, Anthropic, or OpenRouter
  • ElevenLabs (optional) - premium voice synthesis
  • Subscription verification - StoreKit receipt validation
  • Model download - one-time LAM model download for custom avatars

Conversation content is never sent to DigiFrens servers.


15. Subscription Model

Tiers

Free ($0)

  • 2 VRM avatars (Haru, Emi) + 2 Live2D avatars (Hiyori, Mao)
  • Apple Intelligence AI (on supported devices)
  • Bring your own API keys (OpenAI, Anthropic, OpenRouter)
  • Basic system voices
  • Full memory system and LUM
  • Calendar integration

DigiFrens+ ($15/month)

  • Everything in Free, plus:
  • Download and use local LEAP LLMs (no API key needed)
  • Use GPT without an API key
  • Premium ElevenLabs voices (30+ options)
  • Custom avatar creation (Gaussian Splatting)
  • Unlimited interaction time
  • Up to 3 custom avatars
  • Voice customization per avatar
  • Priority processing
  • Early access to new features
  • Priority support
  • 1-week free trial included

16. Platform Requirements

System Requirements

RequirementMinimumRecommended
iOS version26.026.0+
DeviceiPhone 11iPhone 15 Pro+
Storage~500MB~2GB (with custom avatars)

17. Development Status & Roadmap

Current Status

Development began July 2025. The core platform — triple avatar engine, five-phase memory system, LUM cognitive graph, six AI providers, premium voice synthesis, and calendar integration — is fully implemented and functional.

Current focus areas include on-device custom avatar reconstruction via CoreML and Gaussian Splatting rendering polish.

Roadmap

FeatureDescriptionStatus
On-device TTSKokoro 82M-param model as free offline voicePlanned
Desktop companionmacOS app via Catalyst or nativeDocumented
Live2D widgetsHome and lock screen widgetsDocumented
Multimodal inputImage and audio input supportDocumented
Crypto paymentsx402 payment agent integrationDocumented
AR integrationAugmented reality avatar overlayWhitepaper Phase 2

18. Codebase Statistics

MetricValue
Swift source files175
Service domains14
AI providers6
Avatar engines3 (VRM, Live2D, Gaussian Splat)
Memory retrieval strategies9
Mental process states11
Emotion categories15 (7 core + 8 complex)
HEXACO personality traits6
Database tables20+

Appendix: Key References

Research Papers

  • LAM: Large Avatar Model (SIGGRAPH 2025, arXiv:2502.17796) - single-image feed-forward animatable Gaussian avatar reconstruction via FLAME-guided transformer
  • FLAME: Faces Learned with an Articulated Model and Expressions (SIGGRAPH Asia 2017) - statistical 3D morphable face model trained on 33,000+ scans, providing shape/expression/pose disentanglement
  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering (SIGGRAPH 2023) - explicit radiance field representation using anisotropic 3D Gaussians
  • 3D Gaussian Blendshapes (SIGGRAPH 2024) - pure linear blendshape deformation for Gaussians
  • HEXACO Personality Model - six-factor personality framework

Open Source Dependencies

PackageLicensePurpose
MetalSplatterMITGaussian splat rendering on Metal
SplatIOMIT.splat/.ply/.spz file I/O
spz-swiftMITSPZ compressed format support
Live2D Cubism SDKCommercial2D avatar animation
LAMApache-2.0Feed-forward avatar reconstruction (FLAME + Gaussian Splatting)
FLAMECC-BY-4.03D morphable face model (geometric prior + animation rig)

Inspiration

  • Cognitive science research - associative memory models, spreading activation retrieval, 6Rs processing pipeline
  • OpenSouls - mental process state machine pattern for adaptive conversation

DigiFrens - Built with Swift, SwiftUI, and the power of Apple's ecosystem.

Version 3.0 | March 2026 | All rights reserved.