Reputation: 1301
I was trying out some benchmarking of the multi-threaded webserver example in the Rust book and for comparison I built something similar in Go and ran a benchmark using ApacheBench. Though its a simple example the difference was way too much. Go web server doing the same was 10 times faster. Since I was expecting Rust to be faster or at same level, I tried multiple revisions using futures and smol (Though my goal was to compare implementations using only standard library) but result was almost the same. Can anyone here suggest changes to the Rust implementation to make it faster without using a huge thread count?
Here is the code I used: https://github.com/deepu105/concurrency-benchmarks
The tokio-http version is the slowest, the other 3 rust versions give almost same result
Here are the benchmarks:
Rust (with 8 threads, with 100 threads the numbers are closer to Go):
❯ ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 176 bytes
Concurrency Level: 100
Time taken for tests: 26.027 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 195000 bytes
HTML transferred: 176000 bytes
Requests per second: 38.42 [#/sec] (mean)
Time per request: 2602.703 [ms] (mean)
Time per request: 26.027 [ms] (mean, across all concurrent requests)
Transfer rate: 7.32 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 2.9 1 16
Processing: 4 2304 1082.5 2001 5996
Waiting: 0 2303 1082.7 2001 5996
Total: 4 2307 1082.1 2002 5997
Percentage of the requests served within a certain time (ms)
50% 2002
66% 2008
75% 2018
80% 3984
90% 3997
95% 4002
98% 4005
99% 5983
100% 5997 (longest request)
Go:
ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 174 bytes
Concurrency Level: 100
Time taken for tests: 2.102 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 291000 bytes
HTML transferred: 174000 bytes
Requests per second: 475.84 [#/sec] (mean)
Time per request: 210.156 [ms] (mean)
Time per request: 2.102 [ms] (mean, across all concurrent requests)
Transfer rate: 135.22 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 1.4 2 5
Processing: 0 203 599.8 3 2008
Waiting: 0 202 600.0 2 2008
Total: 0 205 599.8 5 2013
Percentage of the requests served within a certain time (ms)
50% 5
66% 7
75% 8
80% 8
90% 2000
95% 2003
98% 2005
99% 2010
100% 2013 (longest request)
Upvotes: 3
Views: 3586
Reputation: 1301
I was finally able to get similar results in Rust using the async_std lib
❯ ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 176 bytes
Concurrency Level: 100
Time taken for tests: 2.094 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 195000 bytes
HTML transferred: 176000 bytes
Requests per second: 477.47 [#/sec] (mean)
Time per request: 209.439 [ms] (mean)
Time per request: 2.094 [ms] (mean, across all concurrent requests)
Transfer rate: 90.92 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 1.7 2 7
Processing: 0 202 599.7 2 2002
Waiting: 0 201 600.1 1 2002
Total: 0 205 599.7 5 2007
Percentage of the requests served within a certain time (ms)
50% 5
66% 6
75% 9
80% 9
90% 2000
95% 2003
98% 2004
99% 2006
100% 2007 (longest request)
Here is the implementation
use async_std::net::TcpListener;
use async_std::net::TcpStream;
use async_std::prelude::*;
use async_std::task;
use std::fs;
use std::time::Duration;
#[async_std::main]
async fn main() {
let mut count = 0;
let listener = TcpListener::bind("127.0.0.1:8080").await.unwrap(); // set listen port
loop {
count = count + 1;
let count_n = Box::new(count);
let (stream, _) = listener.accept().await.unwrap();
task::spawn(handle_connection(stream, count_n)); // spawn a new task to handle the connection
}
}
async fn handle_connection(mut stream: TcpStream, count: Box<i64>) {
// Read the first 1024 bytes of data from the stream
let mut buffer = [0; 1024];
stream.read(&mut buffer).await.unwrap();
// add 2 second delay to every 10th request
if (*count % 10) == 0 {
println!("Adding delay. Count: {}", count);
task::sleep(Duration::from_secs(2)).await;
}
let contents = fs::read_to_string("hello.html").unwrap(); // read html file
let response = format!("{}{}", "HTTP/1.1 200 OK\r\n\r\n", contents);
stream.write(response.as_bytes()).await.unwrap(); // write response
stream.flush().await.unwrap();
}
Upvotes: 3
Reputation: 198
I only compared your "rustws" and the Go version. In Go you have unlimited goroutines (even though you limit them all to only one CPU core) while in rustws you create a thread pool with 8 threads.
Since your request handlers sleep 2 seconds for every 10th request you are limiting the rustws version to 80/2 = 40 requests per second which is what you are seeing in the ab results. Go does not suffer from this arbitrary bottleneck so it shows you the maximum it candle handle on a single CPU core.
Upvotes: 2