Agentic Digital Twins: Why Understanding Behavioral Variability Matters
By Lior Limonad and Fabiana Fournier
Modern AI increasingly relies on agentic systems — multi-step workflows orchestrated by large language models (LLMs) using tools and environment feedback. These systems arepowerful and flexible, but their complexity introduces a significant challenge: behavioral variability. Even with identical prompts and tools, agents may produce different outputs each run. This randomness complicates debugging, ensuring reliability, and aligning agent behavior with intended outcomes.
The Core Idea
In our recent work in the AutoTwin project, we present a compelling solution: apply techniques from process and causal discovery to the execution trajectories of agentic AI captured in agentic ‘Digital Twin’ infrastructure.
Process discovery reconstructs workflow models (think BPMN) from event logs.
Causal discovery infers causal relationships between variables — e.g., whether rephrasing a prompt step leads to different outcomes downstream.
By analyzing agent execution logs with these tools, developers can visualize and quantify expected and unexpected variations in how agents behave.
Why This Matters
Better Observability
Gaining visibility into hidden execution paths helps detect when agents deviate from intended workflows.
Root-Cause Analysis
Causal analysis pinpoints which components (prompt phrasing, tool choice, system state) are driving variability.
Iterative Refinement
With insights from variability patterns, developers can fine-tune prompts, tool usage,or agent configurations systematically.
How It Works (At a Glance)
Log collection and persistence in Agentic Twins
Capture structured records of each agentic action traces — what prompts were used, what tools were invoked, and when.
Apply process mining
Use process mining algorithms to generate a workflow diagram depicting typical and variant control flows.
Run causal discovery
Analyze log to uncover causal links (e.g., “using tool A in step 2 causes branching in outcome”).
Present results
Visualize dominant paths, unintended branches, and causal dependencies using intuitive process diagrams or graphs.
Key Insights
Agentic LLM systems are inherently variable — sometimes that’s beneficial (exploration!), but unchecked variance harms trust and control.
This methodology bridges the gap between intelligent autonomy and responsible AI practices.
It complements existing debugging approaches, enabling both global visibility and fine-grained insight.
Why You Should Care
If you’re building Agentic Twins — to record multi-step agents with external tools powered by LLMs — this paper offers a practical approach to:
Debugging unexpected behaviors
Ensuring reliable and safe operations
Understanding causes behind agent decisions
These capabilities are key as agentic AI expands across domains like code generation, data pipelines, customer support bots, and robotic control.
Final Thoughts
“Agentic Twin for Process Observability” highlights a crucial frontier in AI engineering: making autonomous AI transparent and treatable like any other multi-component software system. By borrowing tools from process mining and causal discovery, the authors show how we can shine a light on opaque agentic behaviors — and ultimately build more trustworthy, maintainable, and alignable AI agents.