“When to Hand Off, When to Work Together”

Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction

Kihoon Son

KAIST

KAIST

KAIST

KAIST

KAIST

U of Michigan

John Joon Young Chung

Midjourney

HyunJoon Jung

Mphora.ai

Juho Kim

KAIST
SkillBench

Paper Dataset

The dominant paradigm for human-agent collaboration is delegation: describe a goal, the agent executes, you review. But as agent execution becomes observable step-by-step — a new question emerges:

When humans can see an agent working in real time, what do they naturally want to do?

They don’t just watch and wait — they want to jump in.

We call this concurrent interaction: humans and agents working freely within the same workspace, even on overlapping work, at the same time. We built CLEO to support it, and studied 10 creative practitioners across 214 turns of human-agent interaction logs.

Abstract

As agents move into shared workspaces and their execution becomes visible, human-agent collaboration faces a fundamental shift from sequential delegation to concurrent co-creation. This raises a new coordination problem: what interaction patterns emerge, and what agent capabilities are required to support them? Study 1 (N=10) revealed that process visibility naturally prompted concurrent intervention, but exposed a critical capability gap: agents lacked the collaborative context awareness needed to distinguish user feedback from independent parallel work. This motivated CLEO, a design probe that embodies this capability, interpreting concurrent user actions as feedback or independent work and adapting execution accordingly. Study 2 (N=10) analyzed 214 turn-level interactions, identifying a taxonomy of five action patterns and ten codes, along with six triggers and four enabling factors explaining when and why users shift between collaboration modes. Concurrent interaction appeared in 31.8% of turns. We present a decision model, design implications, and an annotated dataset, positioning concurrent interaction as what makes delegation work better.

Findings

When working alongside CLEO, designers didn’t interact in one fixed way. Through analysis of 214 turns, we identified five distinct action patterns that emerged naturally during agent execution — ranging from full delegation to real-time co-editing.

Five Interaction Patterns

Hands-off

Fully delegating — the user steps away to focus on their own work while the agent executes independently.

Observational

Watching the agent's step-by-step execution without intervening — often to build a mental model or find the right moment to act.

Directive

Providing verbal guidance mid-execution — either adjusting the agent's approach with new instructions, or redirecting it to a new task entirely.

Concurrent

Directly engaging with the agent's work-in-progress — co-editing the same element, completing a pending subtask, or demonstrating intent through direct manipulation rather than words.

Terminating

Stopping the agent's execution before completion — handing full control back to the user.

What Triggers Intervention?

We identified 6 triggers — specific situations that prompt users to move from passive observation into active intervention.

Trigger	Definition
Idea Spark from Agent’s Work-in-Progress	Observing the agent’s intermediate outputs sparks new creative ideas the user hadn’t previously considered
Need for Early Outcome Visibility	The user wants to see or use the agent’s results sooner — to plan next steps or preview the output before completion
Readiness for Fine-grained Detailing	The current stage requires detailed refinement aligned with the user’s specific vision, prompting them to take over
Task Interpretation Mismatch	The agent pursues a valid but unintended interpretation, or the user realizes their original request was too abstract
Execution Quality Drop	The agent’s performance deteriorates in speed, output fidelity, or overall quality below an acceptable threshold
Emerging New Task for Agent	The user conceives a new task — a different direction or logical next step — while observing the agent’s current work

What Shapes Which Action Users Take?

Even under the same trigger, different users respond differently. We identified 4 enabling factors that explain why.

Enabling Factor	Definition
Mental Model of Agent Capability	Whether the user has a clear sense of what the agent can and cannot do for the current task
Perceived Task Importance	How the user weighs the importance of their own parallel work relative to what the agent is doing
Preferred Intervention Modality	Whether the user finds it easier to express intent through words, direct manipulation, or is uncertain which to use
Expectation of Agent Response	Whether the user believes their intervention — verbal or behavioral — will actually lead to successful task completion

How It All Comes Together: A Decision Model

These patterns, triggers, and enabling factors don’t operate in isolation — they form a dynamic system. We synthesized our findings into a decision model that describes how users navigate collaboration across six recurring interaction loops, from the moment the agent begins a task to the moment it ends.

Decision model of human-agent co-creative collaboration

Full delegation, continuous observation, concurrent engagement, directive guidance — users move between these states fluidly within a single turn, driven by shifting priorities, evolving confidence, and the nature of the work itself. This model accounts for all 214 observed turns without contradiction.

Why Concurrent Interaction Matters

💬

Verbal instructions are ambiguous. Direct action is not.

Users express fine-grained intent through action — demonstrating what they want, rather than describing it.

🔍

Behavioral signals reveal implicit working style.

What you correct, what you leave alone, and what you value — concurrent actions surface the preferences that are hard to put into words.

🤝

This could help agents work the way you actually want.

By capturing how a user intervenes and what they care about, agents could learn to align with their working style over time — making future collaboration more personalized and delegation more confident.

For more detailed findings and design implications, see the full paper.

Dataset

We release a dataset of 214 human-agent co-creative interaction turns with logged actions, triggers, and designer rationales to support future research in human-agent co-creative collaboration.

Coming soon — the dataset is currently being prepared for public release.

Bibtex

@article{son2026when,
      title={"When to Hand Off, When to Work Together": Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction},
      author={Kihoon Son and Hyewon Lee and DaEun Choi and Yoonsu Kim and Tae Soo Kim and Yoonjoo Lee and John Joon Young Chung and HyunJoon Jung and Juho Kim},
      year={2026},
      eprint={2603.02050},
      archivePrefix={arXiv},
      primaryClass={cs.HC}
}

This research was conducted at KIXLAB, KAIST.