DigiFrens Technical Documentation

Version 3.1 | April 2026

1. Executive Summary

DigiFrens is a sophisticated iOS AI companion application that combines animated avatars, intelligent memory systems, and natural voice interaction to create meaningful digital relationships. Unlike conventional AI assistants optimized for productivity, DigiFrens is designed around emotional connection - companions that remember, evolve, and respond to users as individuals.

The application features a triple avatar engine (3D VRM, 2D Live2D, and photorealistic Gaussian Splatting), a five-phase emotional memory system with cognitive graph integration, six AI providers including free on-device Apple Intelligence, and premium voice synthesis. All conversation data is stored locally on the user's device, with no cloud dependency for core functionality.

Key Differentiators:

Privacy-first architecture - conversations never leave the device
Living User Model (LUM) - a cognitive graph that models the user's beliefs, values, and goals
Cognitive Memory Pipeline - spreading activation retrieval inspired by human associative memory
On-device avatar creation - single selfie to photorealistic animated avatar via CoreML
Adaptive conversation intelligence - 11-state mental process that responds to emotional context

2. Product Overview

What is DigiFrens?

DigiFrens creates AI entities with visual presence, unique voices, persistent memories, and consistent personalities. Each companion provides judgment-free interaction and emotional support through natural conversation.

Core Experience

Users interact with animated avatar companions through text or voice. Each conversation is enriched by:

Visual presence - avatars express emotions in real-time through facial expressions, gestures, and idle animations
Voice interaction - premium neural voice synthesis with synchronized lip movements
Persistent memory - companions remember personal details, past conversations, and emotional patterns
Personality evolution - avatar traits shift based on conversation dynamics over time
Proactive intelligence - companions initiate follow-ups, check-ins, and celebrations based on life events

User Flow

Launch - automatic device-based authentication (no sign-up required)
Landing page - horizontal carousel of available avatars with recent conversation previews
Avatar selection - tap to start or resume a conversation
Conversation - text or voice input with real-time avatar responses
Settings - configure AI provider, voice, subscription, and security

Available Companions

Name	Engine	Personality
Haru	VRM (3D)	Cool, introspective, main character energy
Emi	VRM (3D)	Kind, cheerful, playful, optimistic
Hiyori	Live2D (2D)	Cheerful, energetic, bubbly, studious
Mao	Live2D (2D)	Mischievous, playful, witty, curious
Custom	Gaussian Splat	User-created from selfie, personality selectable

3. Technical Architecture

Platform

iOS 26.0+ (iPhone 11 or newer)
Swift 6.0+ / Xcode 16.0+
iPhone 15 Pro+ for Apple Intelligence features
~500MB storage (app + models + CoreML cache)

Architecture

The codebase follows MVVM architecture with 14 service domains covering AI, memory, emotion, voice, calendar, avatar management, and more. Three dedicated engine modules handle avatar rendering (VRM, Live2D, Gaussian Splatting), each conforming to a shared protocol for unified expression control.

Key Technologies

Technology	Purpose
SwiftUI	Declarative UI with @Observable macro
SceneKit	VRM 3D rendering (Metal backend)
Live2D Cubism SDK	2D avatar animation (Metal renderer)
MetalSplatter	GPU-accelerated Gaussian splat rendering (Metal)
CoreML	On-device embeddings (GTE-Small) + avatar reconstruction (LAM + FLAME)
Foundation Models	Apple Intelligence on-device AI (iOS 26+)
SQLite	Local memory and LUM graph persistence
StoreKit 2	Subscription management
EventKit	Calendar integration
Vision	Face detection for avatar capture
AVFoundation	Audio recording, camera, speech

Conversation Flow

CONVERSATION FLOW • MESSAGE TO RESPONSE

Data Layer

Store	Contents	Security
SQLite	Memories, LUM graph, emotions, sessions, personality evolution	Device-local
Keychain	API keys, device ID, passkey credentials	Hardware-encrypted
UserDefaults	Preferences, settings	Standard

4. Avatar System

DigiFrens features a triple-engine avatar system, each optimized for different use cases.

Engine Architecture

AVATAR ENGINE ARCHITECTURE

All three engines conform to AvatarEngineProtocol, providing a unified interface for expression control, lip sync, and idle animation.

VRM Engine (3D)

Avatars: Haru, Emi (~35MB total)

The VRM engine renders 3D avatar models using SceneKit with a Metal backend. It provides:

60 FPS unified animation loop with layered blending
7 emotion expressions (happy, sad, angry, surprised, excited, confused, neutral)
5 viseme shapes for lip synchronization
Idle animations - breathing, blinking, head movement, body sway
Physics simulation - hair and clothing dynamics
NSCache with 100MB byte limit and automatic memory pressure handling

Live2D Engine (2D)

Avatars: Hiyori, Mao (~32MB)

The Live2D engine uses the Cubism SDK with a Metal renderer (migrated from OpenGL ES for iOS 26 compatibility). It features:

Emotion-driven facial animations with parameter mapping
Swift-to-C++ bridge via Objective-C++ (Live2DBridge.mm)
Metal rendering via Live2DMetalView

Gaussian Splatting Engine (Photorealistic Custom Avatars)

Status: In development (core pipeline complete)

The Gaussian Splatting engine enables users to create photorealistic animated avatars from a single selfie photo. The pipeline combines three key technologies: FLAME (a statistical 3D morphable face model) provides structured facial geometry and an animation rig, LAM (Large Avatar Model) is a feed-forward transformer that predicts animatable 3D Gaussians from a single image, and a Metal renderer rasterizes the Gaussians at real-time frame rates on Apple hardware.

How It Works: Single Photo to Animatable 3D Avatar

The reconstruction pipeline runs as a single forward pass through the LAM network — no iterative optimization, no multi-view capture, no cloud processing.

Step 1 — Feature Extraction. A frozen DINOv2 vision backbone extracts multi-scale image features from the input photo. Shallow layers capture local detail (skin texture, hair, wrinkles) while deep layers encode global facial structure.

Step 2 — FLAME-Guided Query Construction. FLAME's 5,023 mesh vertices are subdivided twice to produce 81,424 query points that define a dense, topologically consistent sampling of the face and head. Each point is positionally encoded and projected through an MLP into a learnable feature vector. These structured queries give the transformer a strong geometric prior — it knows where to look on the face rather than predicting structure from scratch.

Step 3 — Transformer Cross-Attention. 10 stacked transformer blocks (16 heads, 1024-dim) perform cross-attention between the FLAME-derived point queries and the image features. Each block refines the point representations, progressively resolving identity, texture, and fine geometric detail.

Step 4 — Gaussian Attribute Prediction. MLP decoder heads predict per-point 3D Gaussian attributes in canonical (neutral expression, forward-facing) space:

Attribute	Dimensions	Purpose
Color (RGB)	3	Appearance
Opacity	1	Transparency / density
Scale	1	Gaussian extent
Rotation	SO(3)	Gaussian orientation
Position offset	3	Refinement beyond FLAME template (captures hair, accessories, wrinkles)

Step 5 — Animation. Standard FLAME linear blend skinning (LBS) with corrective blendshapes deforms the canonical Gaussians to any target expression and head pose. No neural network runs at animation time — it is purely the classical FLAME skinning pipeline applied to the predicted Gaussians. Expression (100 blendshape coefficients) and pose (jaw, neck, eyeball joints) parameters drive the deformation:

G_canonical = FLAME_template + shape_blendshapes(β) + predicted_offsets
G_posed = LBS(G_canonical, joints, pose_θ, expression_φ, blend_weights)

Step 6 — Metal Rendering. The posed Gaussians are rasterized via MetalSplatter using Apple's Metal graphics API, achieving 30-60 FPS on-device with GPU-accelerated splatting.

On-Device Reconstruction Pipeline

ON-DEVICE RECONSTRUCTION • SELFIE TO AVATAR

Why FLAME as the Geometric Prior

FLAME (Faces Learned with an Articulated Model and Expressions) is a statistical 3D morphable face model trained on 33,000+ 3D scans. It decomposes facial variation into three disentangled components:

Shape (300 PCA components) — identity-specific geometry (skull shape, nose size, jaw width)
Expression (100 blendshape components) — facial expressions (smile, frown, surprise)
Pose (articulated joints) — jaw rotation, neck articulation, eyeball gaze

This factored representation means identity, expression, and pose can be varied independently. By reconstructing all Gaussians into FLAME's canonical coordinate space, the network avoids entangling identity with expression — a user's neutral face is always reconstructed the same way regardless of their expression in the input photo.

FLAME also provides dense correspondence: all reconstructed avatars share the same mesh topology, enabling consistent animation control, texture transfer, and blend weight inheritance across every avatar created in the app.

Why 3D Gaussian Splatting

Unlike mesh-based or NeRF-based representations, 3D Gaussian Splatting offers a unique combination of properties ideal for mobile avatar rendering:

Explicit representation — each Gaussian has concrete position, color, opacity, and shape parameters that can be directly manipulated by the FLAME animation rig
Real-time rasterization — Metal-accelerated splatting avoids the expensive ray marching required by neural radiance fields
Photorealistic quality — Gaussians naturally model soft boundaries (hair, skin pores, translucent regions) that triangle meshes struggle with
Compact storage — an avatar's 81,424 Gaussians with 14 channels per point stores efficiently on-device

Performance Comparison

Metric	Previous (Cloud)	Current (On-Device)
Input	60-second guided video	1 selfie photo
Wait time	5-15 minutes	~1.4 seconds (single forward pass)
Network	200MB up, 10MB down	Model download only (~1.2GB, one-time)
Cost per avatar	$0.50-1.50 (cloud GPU)	$0
Works offline	No	Yes (after initial download)

Rendering Performance

Platform	Reconstruction	Animation + Rendering
A100 GPU	~1.4s	281 FPS
MacBook M1 Pro	—	120 FPS
iPhone 16	—	35 FPS
iPhone 15 Pro	—	30+ FPS (target)

The Metal renderer uses GPU-accelerated Gaussian splatting via MetalSplatter, with depth sorting, alpha compositing, and view-dependent rendering handled entirely on the GPU.

LAM Model Details

Property	Value
Paper	LAM: Large Avatar Model (SIGGRAPH 2025)
Authors	He, Gu, Ye, Xu, Zhao, Dong, Yuan, Dong, Bo
License	Apache-2.0
Parameters	557,642,254 (557.6M)
Backbone	DINOv2 (frozen, multi-scale feature extraction)
Decoder	10 transformer blocks, 16 heads, 1024-dim
Input	Single image (518x518)
Output	81,424 animatable 3D Gaussians x 14 channels
Animation rig	FLAME LBS + corrective blendshapes
Geometric prior	FLAME mesh (5,023 vertices, 2x subdivided)
Training data	VFHQ (15,204 video clips, ~3M frames)
Model size (FP16)	1,214.6 MB
CoreML conversion	FP16 quantized for on-device inference

Animation Pipeline

Gaussian avatars are animated through two complementary systems:

FLAME LBS (primary): The FLAME rig drives head pose (jaw, neck, eyeball joints) and expression via 100 blendshape coefficients. Each subdivided vertex inherits blend weights from its parent FLAME vertices, so animation is a direct matrix operation with no neural network inference. Audio-driven animation maps speech audio to FLAME expression parameters in real-time.

ARKit Blendshape Fallback: 52 ARKit blendshape weights drive per-Gaussian position deltas via the 3D Gaussian Blendshapes technique:

deformed_position[i] = neutral[i] + sum(weight_j * delta_j[i])

A region-based deformation system operates when precomputed deltas are unavailable (jaw, mouth, eyes, brows).

Custom Avatar Personalities

Users select from 6 built-in AI personalities when creating a custom avatar. Each personality includes a unique character primer that shapes the companion's conversational style and responses.

5. Memory System

The DigiFrens memory system is a five-phase architecture that goes far beyond simple conversation history. It models emotional patterns, detects behavioral routines, maintains shared language, and uses multiple retrieval strategies to surface the most relevant memories.

Architecture Overview

MEMORY SYSTEM ARCHITECTURE • 5-PHASE

Five Phases

Phase 1: Emotional Timeline

Mood tracking with per-emotion baselines and anomaly detection. Each conversation turn records the user's emotional state, building a timeline that reveals patterns over days, weeks, and months.

Phase 2: Proactive Intelligence

Automated follow-ups, check-ins, celebrations, and crisis detection based on life events and emotional patterns. The system proactively surfaces relevant context without being asked.

Phase 3: Pattern Detection

Detection of behavioral routines (day/time patterns), emotional triggers (90-day timeline analysis), and coping strategies (mood recovery sequences).

Phase 4: Shared Language

Inside jokes, communication quirks, and shared experiences accumulate over time, giving each companion relationship a unique vocabulary and history.

Phase 5: Context Windows

Nine retrieval strategies ensure the most relevant memories surface for each conversation:

Semantic - embedding similarity
Topical - tag and category matching
Emotional - mood-aligned retrieval
Temporal - recent and time-relevant
Recency - most recent interactions
Importance - high-importance memories first
Social - relationship-relevant memories
Associative - linked memory chains
Spreading Activation - graph-based BFS traversal (see Cognitive Memory Pipeline)

Storage & Performance

Metric	Value
Storage format	SQLite with Float16 + zlib compression
Compression ratio	~70% storage reduction
Cache	Thread-safe LRU, 20MB limit
Cache hit latency	<1ms
DB query latency	20-50ms
Semantic search (500 memories)	100-300ms
Total retrieval	150-400ms

6. Living User Model (LUM)

The Living User Model is a cognitive graph that models the user's mental landscape. It tracks what the user believes, values, and aspires to, creating a rich understanding that goes beyond surface-level conversation.

Graph Structure

Node Types

Beliefs - statements the user holds true (with confidence and valence scores)
Values - principles the user considers important (with categories)
Goals - objectives the user is working toward (with progress and status)
Emotional Triggers - situations that reliably produce emotional responses
Narrative Themes - recurring life narrative patterns
Emergent Types - new node types automatically discovered from conversation patterns

Edge Types

Edge Type	Description
`supports`	One node reinforces another
`contradicts`	Nodes are in tension
`triggers`	One node activates another
`motivates`	One node drives another
`leadsTo`	Narrative arc connection
`coEntity`	Shared named entities
`coSession`	Same conversation session
`temporal`	Within 24-hour window
`coTopic`	Shared tags/topics
`semanticSimilar`	Embedding similarity > 0.65

Key Features

Emergent Schema Learning

The system automatically detects new node and edge types from conversation patterns. When users discuss concepts that don't fit existing categories, the LUM proposes new schema elements through EmergentPredicateDetector.

Life Chapters

Automatic narrative arc detection segments the user's experience into chapters (e.g., "job transition," "new relationship," "health focus"). Each chapter provides context for how current conversations relate to broader life patterns.

Mood Trajectory

Real-time classification of emotional direction:

Improving - positive trend
Declining - negative trend (triggers empathic support)
Stable - consistent emotional state
Volatile - rapid emotional shifts (triggers active listening)

Traversal Decay & Reinforcement

Unused connections decay with a 30-day half-life
Traversed edges are reinforced proportional to usage
Ensures the graph stays current and relevant

Context Integration

LUM insights feed directly into AI prompts through LUMContext:

LUM CONTEXT • AI PROMPT INTEGRATION

7. Cognitive Memory Pipeline

The Cognitive Memory Pipeline brings memories into the LUM cognitive graph as first-class nodes, connecting them via typed edges. It is inspired by cognitive science research on associative memory and a 6Rs processing pipeline (Record, Reduce, Reflect, Reweave, Verify, Rethink).

Three Capabilities

Memory Reweaving

Retroactively enriches older memories when new information arrives. Operates in three tiers:

Tier	Trigger	Latency	Scope
Tier 1: Inline	Each new memory stored	~50ms	Entity overlap detection, tag updates, importance boost
Tier 2: Session-end	Conversation ends	Seconds	Semantic similarity edges, narrative continuation, emotional reinterpretation
Tier 3: Deep scan	Daily maintenance	Minutes	Full graph analysis, cross-session patterns

Example: A user mentions "interview next week" in one conversation, then says "I got the job!" two weeks later. Reweaving links these memories via a .leadsTo narrative arc edge.

Knowledge Quality Pipeline (Verify/Rethink)

Systematic quality checks on stored knowledge:

Contradiction detection - flags beliefs that conflict with newer information
Staleness detection - identifies outdated information
Confidence decay - reduces certainty on old, unreinforced beliefs
Sentiment drift analysis - detects emotional valence changes via UnifiedEmotionAnalyzer
Findings routing - quality issues surface through proactive intelligence (max 3 per pass to prevent flooding)

Spreading Activation

The 9th retrieval strategy. Uses breadth-first graph traversal along cognitive edges to discover memories through associative structure rather than just embedding similarity. Activation decays with each hop, and multi-path convergence receives a boost — mirroring how human associative memory works.

Data Flow

COGNITIVE MEMORY PIPELINE • DATA FLOW

8. AI & Intelligence

Multi-Provider Architecture

DigiFrens supports six AI providers through a unified abstraction layer (AIService), giving users flexibility in cost, quality, and privacy.

Provider	Models	Cost	Notes
Apple Intelligence	3B on-device model	Free	No API key, works offline, iOS 26+ required
OpenAI	GPT-4.1 Nano, Mini, GPT-4o	User API key	Cloud-based
Anthropic	Claude Haiku 4.5, Sonnet 4.5, Opus	User API key	Cloud-based
Local LEAP	On-device LEAP SDK models	DigiFrens+	No network required
OpenRouter	Various free and paid models	User API key	Model aggregator
OpenClaw	Self-hosted models	WebSocket gateway	Self-hosted option

Context Building

The ContextBuilder assembles a comprehensive system prompt for every AI request, including:

Relevant memories (multi-strategy retrieval)
Emotional history and current mood
Mental process state and prompt
Avatar personality blueprint (HEXACO traits)
Calendar context (upcoming events)
Shared language (inside jokes, quirks)
LUM cognitive context (beliefs, values, goals)
Spreading activation results

Context is truncated to a configurable maxContextTokens (default: 2000) to stay within provider limits.

Mental Process (OpenSouls Pattern)

An adaptive conversation state machine integrated with LUM:

11 Mental States

State	When Used
Crisis Support	Detected distress or safety concerns
Empathic Support	Declining mood trajectory
Active Listening	Volatile emotions
Deep Conversation	Goal discussion or intellectual topics
Problem Solving	User seeking practical help
Celebration	Milestones or achievements
Playful Banter	Light, fun interactions
Storytelling	Narrative or experience sharing
Casual Chat	Default relaxed conversation
Processing	Absorbing complex information
Transitioning	Shifting between modes

LUM-Aware State Selection

The mental process considers LUM data when selecting states:

Declining mood - +0.3 weight toward empathic support
Volatile emotions - +0.2 weight toward active listening
Goal mention - +0.4 weight toward deep conversation
Negative self-beliefs - applies gentler response modifiers

Response Modifiers

Each state adjusts response parameters:

Verbosity level
Question frequency
Reflection depth
Humor level
Formality

Streaming

The AI system supports streaming responses with:

Streaming API responses - tokens appear as they're generated
Parallel context building - context assembly runs concurrently
TTS pipelining - voice synthesis begins before the full response completes

Embeddings

On-device CoreML embeddings via GTE-Small (384-dimensional). Embeddings never leave the device, powering semantic search across memories.

9. Emotion System

Detection Architecture

EMOTION DETECTION • 6-SIGNAL FUSION

Emotion Categories

Core Emotions (7)

happy, sad, angry, surprised, excited, confused, neutral

Complex Emotions (8)

tired, anxious, content, frustrated, grateful, bored, embarrassed, proud

Detection Features

6-signal weighted fusion - semantic analysis, linguistic markers, sentiment scoring, contextual cues, historical patterns, and explicit statements
Sarcasm detection - identifies when literal text contradicts intended emotion
Context bias correction - adjusts for conversation context
Adaptive learning - self-improving system that calibrates to each user's expression patterns

10. Personality Evolution

HEXACO Model

DigiFrens uses the HEXACO personality framework - six core traits on a 0.0 to 1.0 scale:

Trait	Low End	High End
Honesty-Humility	Manipulative, self-important	Sincere, modest, fair
Emotionality	Stoic, detached	Empathetic, anxious, sentimental
Extraversion	Reserved, quiet	Social, energetic, cheerful
Agreeableness	Critical, stubborn	Forgiving, flexible, patient
Conscientiousness	Spontaneous, disorganized	Organized, diligent, perfectionist
Openness	Practical, conventional	Creative, curious, unconventional

Per-Avatar Baselines

Avatar	H	E	X	A	C	O	Character
Haru	65%	40%	45%	60%	70%	70%	Cool, introspective
Emi	70%	85%	75%	90%	65%	60%	Warm, expressive
Hiyori	80%	55%	40%	65%	90%	85%	Studious, intellectual
Mao	55%	60%	80%	50%	45%	75%	Mischievous, playful

Evolution Mechanics

Session updates: ~1% change per trait per session based on conversation metrics (depth, positivity, engagement)
Relationship multiplier: Scales from 0.5x (new companion) to 1.5x (soulmate level)
Weekly decay: 0.5% per week toward baseline when inactive, ensuring personalities return to character when not reinforced
Trait bounds: Always constrained to 0.0-1.0

AI Integration

Personality traits directly influence avatar responses through behavioral prompts injected into the AI system prompt. Higher extraversion produces more talkative responses; higher emotionality produces more empathetic ones.

Visualization

A radar chart (PersonalityRadarChartView) displays current trait levels against baseline values on a hexagonal chart, letting users see how their companion's personality has evolved.

11. Voice System

Architecture

VOICE SYSTEM ARCHITECTURE

Voice Options

Premium Voices (DigiFrens+)

ElevenLabs integration with 30+ neural voices. Features:

Streaming TTS with word-by-word captions
Per-avatar voice assignment
Natural prosody and expressiveness

System Voices (Free)

Built-in AVSpeechSynthesizer voices available on all devices.

Planned: On-Device TTS (Kokoro)

Kokoro TTS (82M params, ~86MB quantized) is planned as a free offline alternative:

Feature	ElevenLabs	Kokoro (Planned)	System
Quality	Excellent	Good	Basic
Cost	DigiFrens+	Free	Free
Offline	No	Yes	Yes
Voices	30+	11	Many
Latency	~200ms network	~300ms local	Instant

Lip Synchronization

Real-time lip sync maps audio/text to viseme shapes on the avatar:

5 viseme categories for VRM avatars
Text-based phoneme estimation for immediate sync
Audio energy analysis for natural movement timing
Per-engine adaptation - VRM morph targets, Live2D parameters, Gaussian position deltas

12. Calendar Integration

Status: Production-ready

Features

Read calendar events with natural language queries
Create events and reminders from natural language
Support for recurring events (daily, weekly, monthly)
Advanced time parsing (ranges, relative dates, durations)
Proactive 30-minute event reminders
Schedule stress analysis and break suggestions
Graceful degradation without calendar permissions

Natural Language Parsing

Input	Interpretation
"Meeting tomorrow at 9am-3pm"	6-hour event, tomorrow
"Remind me to call Mom next Monday"	Reminder, next Monday
"Block time this afternoon for 2 hours"	2-hour event, today at 2pm
"Cancel my dentist appointment"	Event cancellation

13. Proactive Intelligence

The proactive intelligence system enables companions to initiate contextually relevant interactions without being prompted.

Action Types

Type	Trigger	Example
Follow-up	Life event needs follow-up	"How did the interview go?"
Check-in	3+ days inactivity	"Hey, I haven't heard from you in a while"
Celebration	Milestones reached	"Congrats on reaching your reading goal!"
Concern	Crisis or anomaly detected	Supportive outreach during emotional distress
Reminder	User goal or intention	"You mentioned wanting to start exercising"
Encouragement	Upcoming event support	"Good luck with your presentation tomorrow"

Pattern Detection

Pattern Type	Analysis
Emotional Triggers	90-day timeline analysis identifies situations that reliably produce specific emotions
Coping Strategies	Tracks mood recovery sequences to understand what helps the user feel better
Behavioral Routines	Day/time patterns reveal the user's natural rhythms

Data Flow

PROACTIVE INTELLIGENCE • DATA FLOW

14. Privacy & Security

Design Principles

DigiFrens follows a local-first, privacy-by-default architecture. All sensitive data stays on the user's device.

Data Storage Security

Data	Storage	Security Level
Conversations	Local SQLite	Device-only, never uploaded
Memories	Local SQLite	Device-only
LUM cognitive graph	Local SQLite	Device-only
Emotional timeline	Local SQLite	Device-only
API keys	Keychain	Hardware-encrypted, `kSecAttrAccessibleWhenUnlockedThisDeviceOnly`
Device ID	Keychain	Hardware-encrypted, device-local
Passkey credentials	Keychain	Hardware-encrypted
User preferences	UserDefaults	Standard
Custom avatar models	Documents folder	Device-only

Authentication

Automatic device-based accounts - no sign-up required
Optional passkey security - WebAuthn protocol with biometric verification
No email or password - device ID serves as the user identifier

On-Device Processing

Capability	Implementation
Text embeddings	CoreML (GTE-Small, 384-dim)
Emotion analysis	On-device NLP + learned models
Avatar reconstruction	CoreML (LAM, 557M params, FLAME-guided)
AI responses	Apple Intelligence (on-device, optional)

Cloud Interactions

The only cloud interactions are:

AI providers (optional) - when using OpenAI, Anthropic, or OpenRouter
ElevenLabs (optional) - premium voice synthesis
Subscription verification - StoreKit receipt validation
Model download - one-time LAM model download for custom avatars

Conversation content is never sent to DigiFrens servers.

15. Subscription Model

Tiers

Free ($0)

2 VRM avatars (Haru, Emi) + 2 Live2D avatars (Hiyori, Mao)
Apple Intelligence AI (on supported devices)
Bring your own API keys (OpenAI, Anthropic, OpenRouter)
Basic system voices
Full memory system and LUM
Calendar integration

DigiFrens+ ($15/month)

Everything in Free, plus:
Download and use local LEAP LLMs (no API key needed)
Use GPT without an API key
Premium ElevenLabs voices (30+ options)
Custom avatar creation (Gaussian Splatting)
Unlimited interaction time
Up to 3 custom avatars
Voice customization per avatar
Priority processing
Early access to new features
Priority support
1-week free trial included

16. Platform Requirements

System Requirements

Requirement	Minimum	Recommended
iOS version	26.0	26.0+
Device	iPhone 11	iPhone 15 Pro+
Storage	~500MB	~2GB (with custom avatars)

17. Development Status & Roadmap

Current Status

Development began July 2025. The core platform — triple avatar engine, five-phase memory system, LUM cognitive graph, six AI providers, premium voice synthesis, and calendar integration — is fully implemented and functional.

Current focus areas include on-device custom avatar reconstruction via CoreML and Gaussian Splatting rendering polish.

Roadmap

Feature	Description	Status
On-device TTS	Kokoro 82M-param model as free offline voice	Planned
Desktop companion	macOS app via Catalyst or native	Documented
Live2D widgets	Home and lock screen widgets	Documented
Multimodal input	Image and audio input support	Documented
Crypto payments	x402 payment agent integration	Documented
AR integration	Augmented reality avatar overlay	Whitepaper Phase 2

18. Codebase Statistics

Metric	Value
Swift source files	175
Service domains	14
AI providers	6
Avatar engines	3 (VRM, Live2D, Gaussian Splat)
Memory retrieval strategies	9
Mental process states	11
Emotion categories	15 (7 core + 8 complex)
HEXACO personality traits	6
Database tables	20+

Appendix: Key References

Research Papers

LAM: Large Avatar Model (SIGGRAPH 2025, arXiv:2502.17796) - single-image feed-forward animatable Gaussian avatar reconstruction via FLAME-guided transformer
FLAME: Faces Learned with an Articulated Model and Expressions (SIGGRAPH Asia 2017) - statistical 3D morphable face model trained on 33,000+ scans, providing shape/expression/pose disentanglement
3D Gaussian Splatting for Real-Time Radiance Field Rendering (SIGGRAPH 2023) - explicit radiance field representation using anisotropic 3D Gaussians
3D Gaussian Blendshapes (SIGGRAPH 2024) - pure linear blendshape deformation for Gaussians
HEXACO Personality Model - six-factor personality framework

Open Source Dependencies

Package	License	Purpose
MetalSplatter	MIT	Gaussian splat rendering on Metal
SplatIO	MIT	.splat/.ply/.spz file I/O
spz-swift	MIT	SPZ compressed format support
Live2D Cubism SDK	Commercial	2D avatar animation
LAM	Apache-2.0	Feed-forward avatar reconstruction (FLAME + Gaussian Splatting)
FLAME	CC-BY-4.0	3D morphable face model (geometric prior + animation rig)

Inspiration

Cognitive science research - associative memory models, spreading activation retrieval, 6Rs processing pipeline
OpenSouls - mental process state machine pattern for adaptive conversation

DigiFrens - Built with Swift, SwiftUI, and the power of Apple's ecosystem.