DigiFrens Technical Documentation
Version 3.0 | March 2026
1. Executive Summary
DigiFrens is a sophisticated iOS AI companion application that combines animated avatars, intelligent memory systems, and natural voice interaction to create meaningful digital relationships. Unlike conventional AI assistants optimized for productivity, DigiFrens is designed around emotional connection - companions that remember, evolve, and respond to users as individuals.
The application features a triple avatar engine (3D VRM, 2D Live2D, and photorealistic Gaussian Splatting), a five-phase emotional memory system with cognitive graph integration, six AI providers including free on-device Apple Intelligence, and premium voice synthesis. All conversation data is stored locally on the user's device, with no cloud dependency for core functionality.
Key Differentiators:
- Privacy-first architecture - conversations never leave the device
- Living User Model (LUM) - a cognitive graph that models the user's beliefs, values, and goals
- Cognitive Memory Pipeline - spreading activation retrieval inspired by human associative memory
- On-device avatar creation - single selfie to photorealistic animated avatar via CoreML
- Adaptive conversation intelligence - 11-state mental process that responds to emotional context
2. Product Overview
What is DigiFrens?
DigiFrens creates AI entities with visual presence, unique voices, persistent memories, and consistent personalities. Each companion provides judgment-free interaction and emotional support through natural conversation.
Core Experience
Users interact with animated avatar companions through text or voice. Each conversation is enriched by:
- Visual presence - avatars express emotions in real-time through facial expressions, gestures, and idle animations
- Voice interaction - premium neural voice synthesis with synchronized lip movements
- Persistent memory - companions remember personal details, past conversations, and emotional patterns
- Personality evolution - avatar traits shift based on conversation dynamics over time
- Proactive intelligence - companions initiate follow-ups, check-ins, and celebrations based on life events
User Flow
- Launch - automatic device-based authentication (no sign-up required)
- Landing page - horizontal carousel of available avatars with recent conversation previews
- Avatar selection - tap to start or resume a conversation
- Conversation - text or voice input with real-time avatar responses
- Settings - configure AI provider, voice, subscription, and security
Available Companions
| Name | Engine | Personality |
|---|---|---|
| Haru | VRM (3D) | Cool, introspective, main character energy |
| Emi | VRM (3D) | Kind, cheerful, playful, optimistic |
| Hiyori | Live2D (2D) | Cheerful, energetic, bubbly, studious |
| Mao | Live2D (2D) | Mischievous, playful, witty, curious |
| Custom | Gaussian Splat | User-created from selfie, personality selectable |
3. Technical Architecture
Platform
- iOS 26.0+ (iPhone 11 or newer)
- Swift 6.0+ / Xcode 16.0+
- iPhone 15 Pro+ for Apple Intelligence features
- ~500MB storage (app + models + CoreML cache)
Architecture
The codebase follows MVVM architecture with 14 service domains covering AI, memory, emotion, voice, calendar, avatar management, and more. Three dedicated engine modules handle avatar rendering (VRM, Live2D, Gaussian Splatting), each conforming to a shared protocol for unified expression control.
Key Technologies
| Technology | Purpose |
|---|---|
| SwiftUI | Declarative UI with @Observable macro |
| SceneKit | VRM 3D rendering (Metal backend) |
| Live2D Cubism SDK | 2D avatar animation (Metal renderer) |
| MetalSplatter | Gaussian splat rendering |
| CoreML | On-device embeddings (GTE-Small) + avatar reconstruction (LAM) |
| Foundation Models | Apple Intelligence on-device AI (iOS 26+) |
| SQLite | Local memory and LUM graph persistence |
| StoreKit 2 | Subscription management |
| EventKit | Calendar integration |
| Vision | Face detection for avatar capture |
| AVFoundation | Audio recording, camera, speech |
Conversation Flow
Data Layer
| Store | Contents | Security |
|---|---|---|
| SQLite | Memories, LUM graph, emotions, sessions, personality evolution | Device-local |
| Keychain | API keys, device ID, passkey credentials | Hardware-encrypted |
| UserDefaults | Preferences, settings | Standard |
4. Avatar System
DigiFrens features a triple-engine avatar system, each optimized for different use cases.
Engine Architecture
All three engines conform to AvatarEngineProtocol, providing a unified interface for expression control, lip sync, and idle animation.
VRM Engine (3D)
Avatars: Haru, Emi (~35MB total)
The VRM engine renders 3D avatar models using SceneKit with a Metal backend. It provides:
- 60 FPS unified animation loop with layered blending
- 7 emotion expressions (happy, sad, angry, surprised, excited, confused, neutral)
- 5 viseme shapes for lip synchronization
- Idle animations - breathing, blinking, head movement, body sway
- Physics simulation - hair and clothing dynamics
- NSCache with 100MB byte limit and automatic memory pressure handling
Live2D Engine (2D)
Avatars: Hiyori, Mao (~32MB)
The Live2D engine uses the Cubism SDK with a Metal renderer (migrated from OpenGL ES for iOS 26 compatibility). It features:
- Emotion-driven facial animations with parameter mapping
- Swift-to-C++ bridge via Objective-C++ (Live2DBridge.mm)
- Metal rendering via Live2DMetalView
Gaussian Splatting Engine (Photorealistic Custom Avatars)
Status: In development (core pipeline complete)
The Gaussian Splatting engine enables users to create photorealistic animated avatars from a single selfie photo. This is rendered via MetalSplatter at 30-60 FPS.
On-Device Reconstruction Pipeline
Performance Comparison
| Metric | Previous (Cloud) | Current (On-Device) |
|---|---|---|
| Input | 60-second guided video | 1 selfie photo |
| Wait time | 5-15 minutes | TBD (benchmarking) |
| Network | 200MB up, 10MB down | Model download only (~1.2GB, one-time) |
| Cost per avatar | $0.50-1.50 (cloud GPU) | $0 |
| Works offline | No | Yes (after initial download) |
Animation
Gaussian avatars are animated using the 3D Gaussian Blendshapes technique:
deformed_position[i] = neutral[i] + sum(weight_j * delta_j[i])
52 ARKit blendshape weights drive per-Gaussian position deltas. A fallback region-based deformation system operates when precomputed deltas are unavailable (jaw, mouth, eyes, brows).
LAM Model Details
| Property | Value |
|---|---|
| Paper | LAM: Large Avatar Model (SIGGRAPH 2025) |
| License | Apache-2.0 |
| Parameters | 557,642,254 (557.6M) |
| Input | Single image (518x518) |
| Output | 20,018 animatable 3D Gaussians x 14 channels |
| Animation | FLAME Linear Blend Skinning + blendshapes |
| Model size (FP16) | 1,214.6 MB |
Custom Avatar Personalities
Users select from 6 built-in AI personalities when creating a custom avatar. Each personality includes a unique character primer that shapes the companion's conversational style and responses.
5. Memory System
The DigiFrens memory system is a five-phase architecture that goes far beyond simple conversation history. It models emotional patterns, detects behavioral routines, maintains shared language, and uses multiple retrieval strategies to surface the most relevant memories.
Architecture Overview
Five Phases
Phase 1: Emotional Timeline
Mood tracking with per-emotion baselines and anomaly detection. Each conversation turn records the user's emotional state, building a timeline that reveals patterns over days, weeks, and months.
Phase 2: Proactive Intelligence
Automated follow-ups, check-ins, celebrations, and crisis detection based on life events and emotional patterns. The system proactively surfaces relevant context without being asked.
Phase 3: Pattern Detection
Detection of behavioral routines (day/time patterns), emotional triggers (90-day timeline analysis), and coping strategies (mood recovery sequences).
Phase 4: Shared Language
Inside jokes, communication quirks, and shared experiences accumulate over time, giving each companion relationship a unique vocabulary and history.
Phase 5: Context Windows
Nine retrieval strategies ensure the most relevant memories surface for each conversation:
- Semantic - embedding similarity
- Topical - tag and category matching
- Emotional - mood-aligned retrieval
- Temporal - recent and time-relevant
- Recency - most recent interactions
- Importance - high-importance memories first
- Social - relationship-relevant memories
- Associative - linked memory chains
- Spreading Activation - graph-based BFS traversal (see Cognitive Memory Pipeline)
Storage & Performance
| Metric | Value |
|---|---|
| Storage format | SQLite with Float16 + zlib compression |
| Compression ratio | ~70% storage reduction |
| Cache | Thread-safe LRU, 20MB limit |
| Cache hit latency | <1ms |
| DB query latency | 20-50ms |
| Semantic search (500 memories) | 100-300ms |
| Total retrieval | 150-400ms |
6. Living User Model (LUM)
The Living User Model is a cognitive graph that models the user's mental landscape. It tracks what the user believes, values, and aspires to, creating a rich understanding that goes beyond surface-level conversation.
Graph Structure
Node Types
- Beliefs - statements the user holds true (with confidence and valence scores)
- Values - principles the user considers important (with categories)
- Goals - objectives the user is working toward (with progress and status)
- Emotional Triggers - situations that reliably produce emotional responses
- Narrative Themes - recurring life narrative patterns
- Emergent Types - new node types automatically discovered from conversation patterns
Edge Types
| Edge Type | Description |
|---|---|
supports | One node reinforces another |
contradicts | Nodes are in tension |
triggers | One node activates another |
motivates | One node drives another |
leadsTo | Narrative arc connection |
coEntity | Shared named entities |
coSession | Same conversation session |
temporal | Within 24-hour window |
coTopic | Shared tags/topics |
semanticSimilar | Embedding similarity > 0.65 |
Key Features
Emergent Schema Learning
The system automatically detects new node and edge types from conversation patterns. When users discuss concepts that don't fit existing categories, the LUM proposes new schema elements through EmergentPredicateDetector.
Life Chapters
Automatic narrative arc detection segments the user's experience into chapters (e.g., "job transition," "new relationship," "health focus"). Each chapter provides context for how current conversations relate to broader life patterns.
Mood Trajectory
Real-time classification of emotional direction:
- Improving - positive trend
- Declining - negative trend (triggers empathic support)
- Stable - consistent emotional state
- Volatile - rapid emotional shifts (triggers active listening)
Traversal Decay & Reinforcement
- Unused connections decay with a 30-day half-life
- Traversed edges are reinforced proportional to usage
- Ensures the graph stays current and relevant
Context Integration
LUM insights feed directly into AI prompts through LUMContext:
7. Cognitive Memory Pipeline
The Cognitive Memory Pipeline brings memories into the LUM cognitive graph as first-class nodes, connecting them via typed edges. It is inspired by cognitive science research on associative memory and a 6Rs processing pipeline (Record, Reduce, Reflect, Reweave, Verify, Rethink).
Three Capabilities
Memory Reweaving
Retroactively enriches older memories when new information arrives. Operates in three tiers:
| Tier | Trigger | Latency | Scope |
|---|---|---|---|
| Tier 1: Inline | Each new memory stored | ~50ms | Entity overlap detection, tag updates, importance boost |
| Tier 2: Session-end | Conversation ends | Seconds | Semantic similarity edges, narrative continuation, emotional reinterpretation |
| Tier 3: Deep scan | Daily maintenance | Minutes | Full graph analysis, cross-session patterns |
Example: A user mentions "interview next week" in one conversation, then says "I got the job!" two weeks later. Reweaving links these memories via a .leadsTo narrative arc edge.
Knowledge Quality Pipeline (Verify/Rethink)
Systematic quality checks on stored knowledge:
- Contradiction detection - flags beliefs that conflict with newer information
- Staleness detection - identifies outdated information
- Confidence decay - reduces certainty on old, unreinforced beliefs
- Sentiment drift analysis - detects emotional valence changes via
UnifiedEmotionAnalyzer - Findings routing - quality issues surface through proactive intelligence (max 3 per pass to prevent flooding)
Spreading Activation
The 9th retrieval strategy. Uses breadth-first graph traversal along cognitive edges to discover memories through associative structure rather than just embedding similarity. Activation decays with each hop, and multi-path convergence receives a boost — mirroring how human associative memory works.
Data Flow
8. AI & Intelligence
Multi-Provider Architecture
DigiFrens supports six AI providers through a unified abstraction layer (AIService), giving users flexibility in cost, quality, and privacy.
| Provider | Models | Cost | Notes |
|---|---|---|---|
| Apple Intelligence | 3B on-device model | Free | No API key, works offline, iOS 26+ required |
| OpenAI | GPT-4.1 Nano, Mini, GPT-4o | User API key | Cloud-based |
| Anthropic | Claude Haiku 4.5, Sonnet 4.5, Opus | User API key | Cloud-based |
| Local LEAP | On-device LEAP SDK models | DigiFrens+ | No network required |
| OpenRouter | Various free and paid models | User API key | Model aggregator |
| OpenClaw | Self-hosted models | WebSocket gateway | Self-hosted option |
Context Building
The ContextBuilder assembles a comprehensive system prompt for every AI request, including:
- Relevant memories (multi-strategy retrieval)
- Emotional history and current mood
- Mental process state and prompt
- Avatar personality blueprint (HEXACO traits)
- Calendar context (upcoming events)
- Shared language (inside jokes, quirks)
- LUM cognitive context (beliefs, values, goals)
- Spreading activation results
Context is truncated to a configurable maxContextTokens (default: 2000) to stay within provider limits.
Mental Process (OpenSouls Pattern)
An adaptive conversation state machine integrated with LUM:
11 Mental States
| State | When Used |
|---|---|
| Crisis Support | Detected distress or safety concerns |
| Empathic Support | Declining mood trajectory |
| Active Listening | Volatile emotions |
| Deep Conversation | Goal discussion or intellectual topics |
| Problem Solving | User seeking practical help |
| Celebration | Milestones or achievements |
| Playful Banter | Light, fun interactions |
| Storytelling | Narrative or experience sharing |
| Casual Chat | Default relaxed conversation |
| Processing | Absorbing complex information |
| Transitioning | Shifting between modes |
LUM-Aware State Selection
The mental process considers LUM data when selecting states:
- Declining mood - +0.3 weight toward empathic support
- Volatile emotions - +0.2 weight toward active listening
- Goal mention - +0.4 weight toward deep conversation
- Negative self-beliefs - applies gentler response modifiers
Response Modifiers
Each state adjusts response parameters:
- Verbosity level
- Question frequency
- Reflection depth
- Humor level
- Formality
Streaming
The AI system supports streaming responses with:
- Streaming API responses - tokens appear as they're generated
- Parallel context building - context assembly runs concurrently
- TTS pipelining - voice synthesis begins before the full response completes
Embeddings
On-device CoreML embeddings via GTE-Small (384-dimensional). Embeddings never leave the device, powering semantic search across memories.
9. Emotion System
Detection Architecture
Emotion Categories
Core Emotions (7)
happy, sad, angry, surprised, excited, confused, neutral
Complex Emotions (8)
tired, anxious, content, frustrated, grateful, bored, embarrassed, proud
Detection Features
- 6-signal weighted fusion - semantic analysis, linguistic markers, sentiment scoring, contextual cues, historical patterns, and explicit statements
- Sarcasm detection - identifies when literal text contradicts intended emotion
- Context bias correction - adjusts for conversation context
- Adaptive learning - self-improving system that calibrates to each user's expression patterns
10. Personality Evolution
HEXACO Model
DigiFrens uses the HEXACO personality framework - six core traits on a 0.0 to 1.0 scale:
| Trait | Low End | High End |
|---|---|---|
| Honesty-Humility | Manipulative, self-important | Sincere, modest, fair |
| Emotionality | Stoic, detached | Empathetic, anxious, sentimental |
| Extraversion | Reserved, quiet | Social, energetic, cheerful |
| Agreeableness | Critical, stubborn | Forgiving, flexible, patient |
| Conscientiousness | Spontaneous, disorganized | Organized, diligent, perfectionist |
| Openness | Practical, conventional | Creative, curious, unconventional |
Per-Avatar Baselines
| Avatar | H | E | X | A | C | O | Character |
|---|---|---|---|---|---|---|---|
| Haru | 65% | 40% | 45% | 60% | 70% | 70% | Cool, introspective |
| Emi | 70% | 85% | 75% | 90% | 65% | 60% | Warm, expressive |
| Hiyori | 80% | 55% | 40% | 65% | 90% | 85% | Studious, intellectual |
| Mao | 55% | 60% | 80% | 50% | 45% | 75% | Mischievous, playful |
Evolution Mechanics
- Session updates: ~1% change per trait per session based on conversation metrics (depth, positivity, engagement)
- Relationship multiplier: Scales from 0.5x (new companion) to 1.5x (soulmate level)
- Weekly decay: 0.5% per week toward baseline when inactive, ensuring personalities return to character when not reinforced
- Trait bounds: Always constrained to 0.0-1.0
AI Integration
Personality traits directly influence avatar responses through behavioral prompts injected into the AI system prompt. Higher extraversion produces more talkative responses; higher emotionality produces more empathetic ones.
Visualization
A radar chart (PersonalityRadarChartView) displays current trait levels against baseline values on a hexagonal chart, letting users see how their companion's personality has evolved.
11. Voice System
Architecture
Voice Options
Premium Voices (DigiFrens+)
ElevenLabs integration with 30+ neural voices. Features:
- Streaming TTS with word-by-word captions
- Per-avatar voice assignment
- Natural prosody and expressiveness
System Voices (Free)
Built-in AVSpeechSynthesizer voices available on all devices.
Planned: On-Device TTS (Kokoro)
Kokoro TTS (82M params, ~86MB quantized) is planned as a free offline alternative:
| Feature | ElevenLabs | Kokoro (Planned) | System |
|---|---|---|---|
| Quality | Excellent | Good | Basic |
| Cost | DigiFrens+ | Free | Free |
| Offline | No | Yes | Yes |
| Voices | 30+ | 11 | Many |
| Latency | ~200ms network | ~300ms local | Instant |
Lip Synchronization
Real-time lip sync maps audio/text to viseme shapes on the avatar:
- 5 viseme categories for VRM avatars
- Text-based phoneme estimation for immediate sync
- Audio energy analysis for natural movement timing
- Per-engine adaptation - VRM morph targets, Live2D parameters, Gaussian position deltas
12. Calendar Integration
Status: Production-ready
Features
- Read calendar events with natural language queries
- Create events and reminders from natural language
- Support for recurring events (daily, weekly, monthly)
- Advanced time parsing (ranges, relative dates, durations)
- Proactive 30-minute event reminders
- Schedule stress analysis and break suggestions
- Graceful degradation without calendar permissions
Natural Language Parsing
| Input | Interpretation |
|---|---|
| "Meeting tomorrow at 9am-3pm" | 6-hour event, tomorrow |
| "Remind me to call Mom next Monday" | Reminder, next Monday |
| "Block time this afternoon for 2 hours" | 2-hour event, today at 2pm |
| "Cancel my dentist appointment" | Event cancellation |
13. Proactive Intelligence
The proactive intelligence system enables companions to initiate contextually relevant interactions without being prompted.
Action Types
| Type | Trigger | Example |
|---|---|---|
| Follow-up | Life event needs follow-up | "How did the interview go?" |
| Check-in | 3+ days inactivity | "Hey, I haven't heard from you in a while" |
| Celebration | Milestones reached | "Congrats on reaching your reading goal!" |
| Concern | Crisis or anomaly detected | Supportive outreach during emotional distress |
| Reminder | User goal or intention | "You mentioned wanting to start exercising" |
| Encouragement | Upcoming event support | "Good luck with your presentation tomorrow" |
Pattern Detection
| Pattern Type | Analysis |
|---|---|
| Emotional Triggers | 90-day timeline analysis identifies situations that reliably produce specific emotions |
| Coping Strategies | Tracks mood recovery sequences to understand what helps the user feel better |
| Behavioral Routines | Day/time patterns reveal the user's natural rhythms |
Data Flow
14. Privacy & Security
Design Principles
DigiFrens follows a local-first, privacy-by-default architecture. All sensitive data stays on the user's device.
Data Storage Security
| Data | Storage | Security Level |
|---|---|---|
| Conversations | Local SQLite | Device-only, never uploaded |
| Memories | Local SQLite | Device-only |
| LUM cognitive graph | Local SQLite | Device-only |
| Emotional timeline | Local SQLite | Device-only |
| API keys | Keychain | Hardware-encrypted, kSecAttrAccessibleWhenUnlockedThisDeviceOnly |
| Device ID | Keychain | Hardware-encrypted, device-local |
| Passkey credentials | Keychain | Hardware-encrypted |
| User preferences | UserDefaults | Standard |
| Custom avatar models | Documents folder | Device-only |
Authentication
- Automatic device-based accounts - no sign-up required
- Optional passkey security - WebAuthn protocol with biometric verification
- No email or password - device ID serves as the user identifier
On-Device Processing
| Capability | Implementation |
|---|---|
| Text embeddings | CoreML (GTE-Small, 384-dim) |
| Emotion analysis | On-device NLP + learned models |
| Avatar reconstruction | CoreML (LAM, 557M params) |
| AI responses | Apple Intelligence (on-device, optional) |
Cloud Interactions
The only cloud interactions are:
- AI providers (optional) - when using OpenAI, Anthropic, or OpenRouter
- ElevenLabs (optional) - premium voice synthesis
- Subscription verification - StoreKit receipt validation
- Model download - one-time LAM model download for custom avatars
Conversation content is never sent to DigiFrens servers.
15. Subscription Model
Tiers
Free ($0)
- 2 VRM avatars (Haru, Emi) + 2 Live2D avatars (Hiyori, Mao)
- Apple Intelligence AI (on supported devices)
- Bring your own API keys (OpenAI, Anthropic, OpenRouter)
- Basic system voices
- Full memory system and LUM
- Calendar integration
DigiFrens+ ($15/month)
- Everything in Free, plus:
- Download and use local LEAP LLMs (no API key needed)
- Use GPT without an API key
- Premium ElevenLabs voices (30+ options)
- Custom avatar creation (Gaussian Splatting)
- Unlimited interaction time
- Up to 3 custom avatars
- Voice customization per avatar
- Priority processing
- Early access to new features
- Priority support
- 1-week free trial included
16. Platform Requirements
System Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| iOS version | 26.0 | 26.0+ |
| Device | iPhone 11 | iPhone 15 Pro+ |
| Storage | ~500MB | ~2GB (with custom avatars) |
17. Development Status & Roadmap
Current Status
Development began July 2025. The core platform — triple avatar engine, five-phase memory system, LUM cognitive graph, six AI providers, premium voice synthesis, and calendar integration — is fully implemented and functional.
Current focus areas include on-device custom avatar reconstruction via CoreML and Gaussian Splatting rendering polish.
Roadmap
| Feature | Description | Status |
|---|---|---|
| On-device TTS | Kokoro 82M-param model as free offline voice | Planned |
| Desktop companion | macOS app via Catalyst or native | Documented |
| Live2D widgets | Home and lock screen widgets | Documented |
| Multimodal input | Image and audio input support | Documented |
| Crypto payments | x402 payment agent integration | Documented |
| AR integration | Augmented reality avatar overlay | Whitepaper Phase 2 |
18. Codebase Statistics
| Metric | Value |
|---|---|
| Swift source files | 175 |
| Service domains | 14 |
| AI providers | 6 |
| Avatar engines | 3 (VRM, Live2D, Gaussian Splat) |
| Memory retrieval strategies | 9 |
| Mental process states | 11 |
| Emotion categories | 15 (7 core + 8 complex) |
| HEXACO personality traits | 6 |
| Database tables | 20+ |
Appendix: Key References
Research Papers
- LAM: Large Avatar Model (SIGGRAPH 2025) - single-image animatable Gaussian avatar reconstruction
- 3D Gaussian Blendshapes (SIGGRAPH 2024) - pure linear blendshape deformation for Gaussians
- HEXACO Personality Model - six-factor personality framework
Open Source Dependencies
| Package | License | Purpose |
|---|---|---|
| MetalSplatter | MIT | Gaussian splat rendering on Metal |
| SplatIO | MIT | .splat/.ply/.spz file I/O |
| spz-swift | MIT | SPZ compressed format support |
| Live2D Cubism SDK | Commercial | 2D avatar animation |
| LAM | Apache-2.0 | Avatar reconstruction model |
Inspiration
- Cognitive science research - associative memory models, spreading activation retrieval, 6Rs processing pipeline
- OpenSouls - mental process state machine pattern for adaptive conversation
DigiFrens - Built with Swift, SwiftUI, and the power of Apple's ecosystem.
Version 3.0 | March 2026 | All rights reserved.