WARC-Bench

Web Archive Based Benchmark for GUI Subtask Executions

Abstract

We present Orby Web Agent, a comprehensive framework for developing and evaluating vision-based web agents using WARC (Web ARChive) file servers and BrowserGym environments. Our system enables realistic benchmarking of automated web interaction by serving archived web pages as live websites, providing reproducible and controlled testing environments.

The framework features SvaV4, a pure-vision agent optimized for short-horizon tasks with combined task completion evaluation and action generation in a single model call. Our approach supports diverse web automation scenarios including both real-world environments (ZenDesk, GitHub) and synthetic test cases, enabling comprehensive evaluation of agent capabilities across different interaction patterns.

This work provides researchers and practitioners with a unified toolkit for developing, testing, and benchmarking web agents in reproducible environments, advancing the state of automated web interaction and multi-step task completion.

Key Features

WARC-Based Replay

Serve archived web pages as live websites for reproducible benchmarking with controlled, deterministic environments.

Vision-Driven Agent

SvaV4 agent uses pure vision for web interaction with efficient single-call execution for short-horizon tasks.

BrowserGym Integration

Seamless integration with BrowserGym for standardized action spaces including click, type, scroll, and more.

Comprehensive Evaluation

Built-in evaluation framework with trajectory recording, visualization, and task completion metrics.

Demonstrations

Disclaimer

The names and data portrayed in these demonstrations are either synthetic or sourced from openly available real websites. They hold absolutely no connection to the authors, direct or indirect.

Citation

BibTeX

@misc{srivastava2025warcbenchwebarchivebased,
                    title={WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions}, 
                    author={Sanjari Srivastava and Gang Li and Cheng Chang and Rishu Garg and Manpreet Kaur and Charlene Y. Lee and Yuezhang Li and Yining Mao and Ignacio Cases and Yanan Xie and Peng Qi},
                    year={2025},
                    eprint={2510.09872},
                    archivePrefix={arXiv},
                    primaryClass={cs.LG},
                    url={https://arxiv.org/abs/2510.09872}, 
              }