Apr 15, 2025

Parallel Agent Execution with AG2: Best Practices Guide

Justin Trugman

Cofounder & Head of Technology

BetterFutureLabs

Multi-threading with open-source LLMs was a game changer for us. We've been able to run independent tasks simultaneously, instead of waiting for agents sequentially, slashing turnaround times on document processing. Our team can now process 5x more cases without proportional headcount increases.

— Justin Trugman, Cofounder & Head of Technology, BetterFutureLabs

Why Parallel Execution Matters

Consider a real-world scenario where you're building a medical diagnosis support system. Your system needs multiple specialized multi-agent teams simultaneously analyzing different aspects of a patient case:

•Symptom pattern analysis team: Correlating symptoms with medical history, identifying differential diagnoses, and assessing symptom severity patterns (8-12 minutes)
•Medical literature review team: Synthesizing recent research, reviewing treatment protocols, and identifying evidence-based recommendations (10-15 minutes)
•Drug interaction analysis team: Checking medication safety, identifying contraindications, and assessing drug-drug interactions with current medications (6-9 minutes)
•Imaging analysis team: Interpreting radiology results, detecting anomalies, and correlating findings with clinical presentation (7-10 minutes)
•Treatment planning team: Developing therapy recommendations, creating treatment timelines, and considering patient-specific factors (9-12 minutes)

Each team must reach internal consensus through multi-agent discussions before contributing to the final diagnostic assessment.

Sequential execution: 40-58 minutes total (each team waits for the previous to complete)

Parallel execution: 10-15 minutes total (limited by the longest-running team)

This 70-75% reduction in processing time transforms diagnosis from a lengthy batch process into a near real-time clinical decision support tool, potentially improving patient outcomes through faster, more comprehensive analysis.

Core Implementation Pattern

We've found success using Python's concurrent.futures.ThreadPoolExecutor as the foundation for parallel agent execution. This approach provides several key advantages:

Why ThreadPoolExecutor works well for AG2 Multi-Agent Systems:

•Perfect for AG2 agent operations: AG2 agents spend time waiting for LLM API calls, tool operations, web requests, etc
•Shared memory space: Agents can access the same configuration objects and shared utilities without complex inter-process communication
•Resource efficiency: Lower overhead than full process creation while still providing true parallelism for I/O operations
•Exception handling: Clean error propagation and debugging

How it works: ThreadPoolExecutor creates a pool of worker threads that can execute functions concurrently. When you submit a task, it gets assigned to an available worker thread. The key insight is that while one thread waits for an API response, other threads can continue processing their own tasks.

import concurrent.futures
  
# Define independent diagnostic analysis tasks
diagnostic_tasks = [ 
  "symptom_pattern_analysis", 
  "literature_review", 
  "drug_interaction_check",
  "imaging_analysis", 
  "treatment_planning"
]
    
# Execute diagnostic teams in parallel
with concurrent.futures.ThreadPoolExecutor() as executor: 
  futures = [ 
    executor.submit(run_diagnostic_team, team_id, analysis_type, patient_data) 
    for analysis_type in diagnostic_tasks 
  ] 
  concurrent.futures.wait(futures) # Wait for all teams to complete

Each run_diagnostic_team call creates an independent set of AG2 agents (medical specialist, clinical reviewer, supervisor) that collaborate on their specific analysis without interfering with other parallel teams. The concurrent.futures.wait() function blocks until all teams have completed their work, ensuring synchronization before proceeding to further diagnostic synthesis.

Agent Factory Pattern

For this example, we're using an Agent Factory Pattern to showcase the power of parallel execution. Rather than sharing agent instances across parallel tasks, each parallel execution creates completely fresh agent instances. This prevents state contamination and ensures true independence.

def run_diagnostic_team(team_id, analysis_type, patient_data): 
  """ 
  Factory function that creates a complete independent agent team 
  for a specific medical analysis task 
  """ 
  
  # Create fresh user proxy for this diagnostic team 
  user_proxy = UserProxyAgent( 
    name=f"coordinator_{analysis_type}_{team_id}", 
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
    human_input_mode="NEVER", 
    max_consecutive_auto_reply=1, 
    code_execution_config=False 
  ) 
  
  # Create specialized medical analyst with unique configuration 
  medical_specialist = GPTAssistantAgent( 
    name=f"specialist_{analysis_type}_{team_id}", 
    instructions=medical_instructions[analysis_type], 
    overwrite_instructions=True, # Ensure clean state 
    overwrite_tools=True, 
    llm_config={ 
      "config_list": config_list, 
      "tools": diagnostic_tools[analysis_type], 
      "assistant_id": specialist_assistant_ids[analysis_type] 
    } 
  ) 
  
  # Register analysis-specific functions 
  medical_specialist.register_function( 
    function_map=diagnostic_functions[analysis_type] 
  ) 
  
  # Create complete diagnostic team and execute 
  team = [user_proxy, medical_specialist, clinical_reviewer, supervisor] 
  groupchat = ag2.GroupChat(agents=team, messages=[], max_round=15) 
  chat_manager = ag2.GroupChatManager(groupchat=groupchat, llm_config={"config_list": config_list}) 
  
  # Execute diagnostic analysis with patient context 
  user_proxy.initiate_chat(chat_manager, message=f"Perform {analysis_type} for patient: {patient_data}")

Key principles of the factory pattern:

•Fresh instances: Every parallel execution gets brand new agent objects
•Unique naming: Prevents agent name conflicts across parallel executions
•Isolated configuration: Each team gets its own tools, instructions, and assistant IDs
•Clean state: overwrite_instructions=True ensures no state leakage between executions

Common Pitfalls to Avoid

Race Conditions

The most dangerous trap in parallel execution is sharing mutable state between agents. When multiple threads modify the same data structure simultaneously, you get unpredictable behavior and data corruption. This is especially problematic with global dictionaries, shared configuration objects, or any state that gets modified during execution.

# ❌ BAD: Shared mutable state
shared_patient_analysis = {}
def bad_diagnostic_process(analysis_type): 
  shared_patient_analysis[analysis_type] = perform_analysis(...) # Race condition!
      
# ✅ GOOD: Independent storage
def good_diagnostic_process(analysis_type, patient_id): 
  result = perform_analysis(...) 
  save_to_file(f"diagnosis_{patient_id}_{analysis_type}.json", result)

Resource Exhaustion

Creating unlimited threads will overwhelm your system and degrade performance for all tasks. Each thread consumes memory and system resources. Too many threads also increase context switching overhead, making everything slower rather than faster.

# ❌ BAD: Unlimited workers
with ThreadPoolExecutor() as executor: # Could create too many threads

# ✅ GOOD: Controlled resource usage
max_workers = min(len(tasks), 5)
with ThreadPoolExecutor(max_workers=max_workers) as executor:

Ignoring Failures

In parallel execution, individual tasks will fail. Network timeouts, API rate limits, and processing errors are inevitable. If you don't handle these gracefully, one failed task can crash your entire parallel workflow, wasting all the work completed by successful tasks.

# ❌ BAD: No error handling
futures = [executor.submit(task) for task in tasks]
results = [f.result() for f in futures] # Will crash on any failure
    
# ✅ GOOD: Graceful error handling
for future in concurrent.futures.as_completed(futures): 
  try: 
    result = future.result(timeout=300) 
    handle_success(result) 
  except Exception as e: 
    handle_error(e)

Production Implementation Tips

Error Handling

Robust error handling is essential for production parallel execution. You need to capture and log failures without stopping the entire workflow. The key is to collect both successful results and error information, then decide how to handle partial failures based on your business requirements.

def execute_parallel_tasks(doc_id, tasks, document_data): 
  results = {} 
  errors = {}

  with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: 
    future_to_task = { 
      executor.submit(process_document, doc_id, task, document_data): task 
      for task in tasks 
    } 
    
    for future in concurrent.futures.as_completed(future_to_task): 
      task = future_to_task[future] 
      try: 
        results[task] = future.result(timeout=300) 
        logging.info(f"Task {task} completed successfully") 
      except Exception as e: 
        errors[task] = str(e) 
        logging.error(f"Task {task} failed: {e}") 
        
  return results, errors

Resource Management

The default ThreadPoolExecutor worker count is min(32, (os.process_cpu_count() or 1) + 4). This formula preserves at least 5 workers while avoiding excessive resource usage on many-core machines. For more details, see the Python ThreadPoolExecutor documentation.

# Calculate optimal worker count based on Python's official recommendation
import os
def get_optimal_workers(task_count): 
  # Following Python's default formula 
  default_workers = min(32, (os.process_cpu_count() or 1) + 4) 
  return min(task_count, default_workers)

Best Practices for Production Systems

After implementing the core patterns, follow these production-ready guidelines:

1. Design for Complete Independence

✓Do: Use separate data sources and storage for each parallel task
✓Do: Create unique agent names and IDs to avoid conflicts
✗Don't: Share mutable state between parallel agents
✗Don't: Create dependencies between parallel tasks
✗Don't: Reuse agent instances across parallel executions

2. Intelligent Resource Management

•Calculate optimal workers: Use Python's recommended formula min(32, (os.process_cpu_count() or 1) + 4 for I/O-bound agent operations
•Implement appropriate timeouts: Benchmark your specific agent tasks to determine realistic timeout values, then add a buffer for variability
•Monitor system resources: Track memory and API rate limits to prevent resource exhaustion

3. Comprehensive Error Handling Strategy

•Graceful degradation: Continue processing other tasks when some fail
•Detailed logging: Log start times, completion status, and error details for each parallel task

4. Performance Monitoring and Optimization

Whatever monitoring tool or logging system you're using, these are valuable metrics to track for parallel agent execution:

Key metrics to track:

•Total execution time: Compare sequential vs parallel performance to measure improvement
•Individual task duration: Identify bottleneck tasks that may need optimization or additional resources
•Resource utilization: Monitor CPU, memory, and API rate limits to prevent exhaustion
•Success/failure rates: Track reliability trends over time to identify patterns in task failures
•Parallel efficiency: Measure how well you're utilizing available resources compared to sequential execution
•API quota consumption: Track LLM API usage across parallel tasks to avoid rate limiting

Implementation Summary

Parallel agent execution with AG2 transforms research and analysis applications from sequential batch operations into concurrent workflows. The patterns outlined in this guide provide a production-ready foundation for scaling multi-agent systems.

Recommended implementation approach:

1.Start simple: Begin with basic ThreadPoolExecutor patterns for independent tasks
2.Add sophistication gradually: Implement dynamic allocation as system complexity grows
3.Monitor extensively: Use comprehensive metrics to identify optimization opportunities
4.Design for failures: Assume tasks will fail and build resilience from the ground up

Expected outcomes when properly implemented:

•Significant reduction in processing time for multi-task workflows
•Improved user experience through concurrent execution
•Better resource utilization with intelligent task allocation
•Higher system reliability through graceful error handling

By applying proper resource management patterns and designing for independence, you can build systems that scale horizontally while maintaining the collaborative intelligence that makes multi-agent systems effective.

Parallel execution fundamentally changes how users interact with multi-agent systems. Instead of waiting for sequential processing, users get concurrent analysis across multiple specialized domains. Start with these patterns, measure your results, and iterate based on your specific use case requirements.