The libp2p ecosystem spans multiple programming languages, transports, and protocols. Testing interoperability across this diverse landscape has always been challenging. Today, we're announcing a complete rewrite of the test-plans repository that fundamentally improves how we test libp2p implementations.
Why a Complete Rewrite?
The original test framework was built with TypeScript, Docker Compose, and various npm dependencies. While functional, it presented several challenges:
- Complex dependency chains: Node.js, npm, and Python dependencies created friction for contributors
- Platform inconsistencies: Tests behaved differently across Linux, macOS, and Windows
- Limited reproducibility: Recreating test failures was difficult without extensive setup
- Rigid test selection: Running specific subsets of tests required manual configuration
- Slow iteration cycles: The build and test pipeline was optimized for CI, not local development
More importantly, 2026 marks a pivotal year for libp2p research efforts focused on scaling and optimization. As we push the boundaries of what's possible with peer-to-peer networking, we need a test framework that can keep pace. Researchers investigating new transport protocols, scaling strategies, and exploring AI-driven dynamic protocols require fast feedback loops, reproducible experiments, and the ability to quickly iterate on implementations across multiple languages. The old framework simply couldn't support the velocity and rigor that this research demands.
We set out to address these issues with a clear set of goals.
The 10 Primary Goals
1. Cross-Platform Support
The new framework runs natively on Linux, macOS, and Windows (via WSL). We've eliminated platform-specific code paths and ensured consistent behavior across all environments. A developer on macOS can reproduce the exact test that failed in CI on Linux.
2. Minimal Dependencies
We reduced dependencies to the essentials:
- Bash 4.0+ (for associative arrays and modern shell features)
- Docker 20.10+ with Docker Compose v2
- yq 4.0+ (for YAML processing)
- Git 2.0+
No Node.js. No npm. No Python. No pip. Just standard tools available on any development machine.
3. Rapid Testing in CI/CD and Local Environments
The framework is optimized for both CI pipelines and local development. You can run the same commands locally that CI runs, with identical results. Quick feedback loops enable faster iteration.
4. Follow CI/CD and Programming Conventions
We adhere to standard patterns: clear exit codes, structured logging to stderr, machine-readable output to stdout, and conventional command-line arguments. The barrier to entry is low for anyone familiar with shell scripting.
5. Code Reusability via Shared Library
The lib/ directory contains 19 reusable shell scripts (~4,000+ lines) that provide common functionality:
- Filter engine with alias expansion and negation
- Image building for GitHub, local, and browser sources
- Caching with content-addressed keys
- Test execution coordination with Redis
- Output formatting for consistent terminal UI
Each test suite (perf, transport, hole-punch) imports these libraries, ensuring consistency and reducing duplication.
6. Aggressive Caching
Three levels of caching dramatically improve performance:
| Cache Type | Miss | Hit | Speedup |
|---|---|---|---|
| Test matrix | 2-5s | 50-200ms | 10-100x |
| GitHub snapshots | 5-30s | 1-2s | 5-15x |
| Docker images | 30-300s | 0.1s | 300-3000x |
The test matrix cache uses a content-addressed key computed from images.yaml and all filter parameters. Change a filter, get a new key. Same filters, same cached matrix.
7. Fine-Grained Filtering
The two-stage filtering model provides precise control:
Stage 1 (SELECT): Narrow from the complete list
./run.sh --impl-select "~rust|~go" # Only rust and go implementations
Stage 2 (IGNORE): Remove from selected set
./run.sh --impl-ignore "experimental" # Exclude experimental versions
Filter dimensions include:
--impl-select/ignore: Filter implementations--transport-select/ignore: Filter transports (tcp, quic-v1, ws, etc.)--secure-select/ignore: Filter secure channels (noise, tls)--muxer-select/ignore: Filter muxers (yamux, mplex)--test-select/ignore: Filter by test name pattern
Aliases make common patterns easy:
./run.sh --impl-select "~rust" # Expands to rust-v0.56|rust-v0.55|rust-v0.54|...
./run.sh --impl-ignore "!~rust" # Everything NOT matching rust (negation)8. YAML Configuration with Comments
All configuration uses YAML files with extensive comments:
- images.yaml: Implementation definitions with versions, transports, and sources
- inputs.yaml: Auto-generated capture of all test parameters
- test-matrix.yaml: Generated test combinations with metadata
Human-readable configuration lowers the barrier to understanding and modification.
9. Local and Remote Test Applications with Patching
Testing local changes doesn't require forking repositories. The patching strategy lets you:
- Clone an implementation locally
- Make your changes
- Generate a patch file
- Reference it in
images.yaml
The framework downloads the upstream snapshot, applies your patch, and builds the image. See our Local Testing Strategies guide for details.
10. Docker for Arbitrary Network Layouts
Each test suite uses Docker to create isolated, reproducible network environments:
- Transport tests: Simple dialer/listener on a shared network
- Performance tests: Controlled environment for accurate measurements
- Hole-punch tests: Complex topology with NAT routers, relay servers, and isolated LANs
The hole-punch tests create five containers per test with three networks, simulating realistic NAT traversal scenarios.
What Changed: By the Numbers
Between commits f58b7472 and d6e5bea1:
- 196 commits of focused development
- 284 files changed
- +73,488 insertions, -49,912 deletions
- 11 new documentation files (~6,000+ lines)
- Migration from TypeScript to ~4,000+ lines of shared bash libraries
The result is a simpler, more maintainable codebase that's easier to understand and extend.
Test Suites
Performance Benchmarking (perf/)
Measures the overhead that libp2p introduces:
- Upload throughput (bytes/second)
- Download throughput (bytes/second)
- Latency with statistical distribution (min, q1, median, q3, max, outliers)
Baseline tests against iperf, raw QUIC, and HTTPS establish reference points for measuring libp2p overhead.
Transport Interoperability (transport/)
Verifies cross-implementation compatibility:
- Dial success/failure
- Handshake latency
- Ping latency
Tests run in parallel (default: CPU core count) for fast feedback on large test matrices.
Hole-Punch NAT Traversal (hole-punch/)
Tests the DCUtR protocol for establishing direct connections through NAT:
- Realistic network topology with NAT routers
- Relay server coordination
- Direct connection verification
Each test gets unique subnets calculated from the test key, enabling parallel execution without network conflicts.
Implementation Coverage
The test suite covers implementations in:
- Rust (rust-libp2p)
- Go (go-libp2p)
- JavaScript (js-libp2p v1.x, v2.x, v3.x)
- Python (py-libp2p)
- Nim (nim-libp2p)
- JVM (jvm-libp2p)
- C (c-libp2p)
- .NET (dotnet-libp2p)
- Zig (zig-libp2p)
- Browsers (via WebRTC and WebTransport)
With 40+ implementation variations across different versions and configurations.
Getting Started
Check Dependencies
cd perf
./run.sh --check-depsList Available Implementations
./run.sh --list-imagesPreview Test Selection
./run.sh --impl-select "~rust" --list-testsRun Tests
# Performance tests with rust implementations
cd perf
./run.sh --impl-select "~rust" --iterations 5
# Transport interoperability
cd transport
./run.sh --impl-select "~rust|~go"
# Hole-punch tests
cd hole-punch
./run.sh --impl-select "~rust"Create Reproducible Snapshots
./run.sh --impl-select "~rust" --snapshot
The snapshot captures everything needed to reproduce the test run.
Reproducibility with inputs.yaml
Every test run generates an inputs.yaml file capturing:
- All command-line arguments
- Environment variables
- Filter settings
- Test-specific parameters
To reproduce a previous run:
cp /srv/cache/test-run/perf-abc12345/inputs.yaml ./
./run.sh
The framework reads inputs.yaml at startup and applies the same configuration.
Future Work
- Remote host testing via Docker Swarm for real network conditions
- Additional test suites for other protocols
- Improved reporting with historical comparisons
- Community contributions welcome for new implementations
Resources
- test-plans repository
- Local Testing Strategies - Installation, filtering, and patching
- Write a Performance Test Application
- Write a Transport Test Application
- Write a Hole-Punch Test Application
We believe this rewrite significantly improves the developer experience for testing libp2p implementations. The combination of cross-platform support, powerful filtering, reproducibility, and comprehensive documentation makes it easier than ever to ensure your libp2p implementation works correctly with the rest of the ecosystem.
Try it out, and let us know what you think!