Useless benchmarks. Outperforming Unix sort by more than 7 times with 30 lines of Rust
Hey, Medium! Yesterday I read an interesting article about writing faster analog wc
in Haskell. So… After reading I thought about writing some tools in Rust but it so boring I think. And I decided to write a simple sort tool in Rust and benchmark it.
Disclaimer
I know what Unix sort do much more work than rust solution and all of the benchmarks there did just for fun.
Conditions
Testing data
I downloaded that CSV file for testing:
du -sh part.csv
179M part.csv
Specifications
All tests running on MacBook Pro (16-inch, 2019) 8-Core Intel Core i9 with 16 GB ram.
Testing tool
All tests with provided with hyperfine
hyperfine --version
hyperfine 1.9.0
Rust built with 1.41 version
rustc -V
rustc 1.41.0 (5e1a79984 2020-01-27)
Cargo.toml optimizations
[profile.release]
opt-level = 3
Unix sort version
sort --version
2.3-Apple (101.40.1)
Let’s start
Naive implementation
The first implementation that came to my mind looked like this:
use std::collections::BinaryHeap;
use std::env;
use std::fs::File;
use std::io::{BufRead, BufReader, Write};fn main() -> std::io::Result<()> {
let args: Vec<String> = env::args().collect();
let path = &args[1];let file = File::open(path).unwrap();
let reader = BufReader::new(file);
let stdout = std::io::stdout();
let mut writer = stdout.lock();let heap: BinaryHeap<String> =
reader
.lines()
.map(|line| line.unwrap())
.fold(BinaryHeap::new(), |mut heap, line| {
heap.push(line + "\n");
heap
});
write!(
writer,
"{}",
heap.into_sorted_vec()
.iter()
.fold(String::new(), |data, line| data + line)
)
}
It was already almost 4 times faster than the original sort, but it could still be improved.
Moving heap to vector
Initially, I assumed that using a heap to store lines of a file would be as efficient as possible, but that was not the case. After using standard Rust mechanism to bring the iterator to a vector, I got 70% speed boost (2.317s versus 3.22s)
use std::env;
use std::fs::File;
use std::io::{BufRead, BufReader, Write};fn main() -> std::io::Result<()> {
let args: Vec<String> = env::args().collect();
let path = &args[1];let file = File::open(path).unwrap();
let reader = BufReader::new(file);
let stdout = std::io::stdout();
let mut writer = stdout.lock();let mut heap: Vec<String> = reader.lines().map(|line| line.unwrap()).collect();
heap.sort();
write!(
writer,
"{}",
heap.iter().fold(String::new(), |data, line| data + line + "\n")
)
}
It would seem that one could stop here, but what if we use parallel sorting (with the MergeSort algorithm)?
Parallel Version
The parallel version differs from the previous one in just 2 lines of code
+use rayon::prelude::*;-heap.sort();
+heap.par_sort();
Yes, in Rast we just connected the library in Cargo.toml and now we can use parallel sorting! And now we have an increase of 68% compared to the previous version.
Conclusion
Of course, this code does not even pretend to partially replace the standard sort utility, and everything was written only for fun and useless benchmarks. Thanks for reading!
Updates
1
As readers at the Reddit noted, a utility with the environment LC_ALL = C
greatly accelerates the execution time. Therefore, I give a final comparison with the addition of this environment here