Day 6 is a fun one. The task itself is very simple, but I took it as an opportunity to do some performance benchmarking and explore some of the nuances around how implementation affects performance. Read on for more details!
The Repository
All of the code for this day's solutions, as well as the data input file, can be found at:
https://github.com/wyhaines/advent-of-code-2022/tree/main/day_6
Part 1
To see the full details of the challenge, visit the Advent of Code 2022 - Day 6 page. However, the tl;dr version of the challenge is as follows:
You and the Elves are on your way to the star fruit grove, and one of the Elves gives you a handheld device. The device has many fancy features, but the most important one to set up right now is the communication system. Unfortunately, the device is malfunctioning. However, the Elf is confident that you, with your significant experience dealing with signal-based systems, will have no problem fixing it. Just then, the device emits a few colorful sparks.
To fix the communication system, you need to add a subroutine to the device that detects a start-of-packet marker in the data stream. In the protocol being used by the Elves, the start of a packet is indicated by a sequence of four characters that are all different. The device will send your subroutine a datastream buffer (your puzzle input); your subroutine needs to identify the first position where the four most recently received characters were all different. Specifically, it needs to report the number of characters from the beginning of the buffer to the end of the first such four-character marker.
For example, suppose you receive the following datastream buffer:
mjqjpqmgbljsphdztnvjfqwrcgsmlb
After the first three characters (
mjq
) have been received, but there haven't been enough characters received yet to find the marker. The first time a marker could occur is after the fourth character is received, making the most recent four charactersmjqj
. Becausej
is repeated, this isn't a marker. The first time a marker appears is after the seventh character arrives. Once it does, the last four characters received arejpqm
, which are all different. In this case, your subroutine should report the value 7, because the first start-of-packet marker is complete after 7 characters have been processed.
Essentially, the task is to put a 4-character sliding window over the string of characters and then check if all four characters in that sliding window are unique. If they are, then return the index of the first character after that 4-character block.
The Approach
Slurp the input into a string. There is no additional processing this time!
Iterate over a sliding window of 4 characters, passing those characters into another method that will determine if the characters are unique. Return the index +4 of that window when a unique set is found.
Write a method to determine if a given 4 characters are unique.
Ruby solution
Ruby makes so much about programming so easy.
# frozen_string_literal: true
class TuningTrouble
def initialize(filename)
@data = parse_packet_data(filename)
end
def run
puts "The start of packet marker is at character ##{find_start_of_packet}"
end
def parse_packet_data(filename)
File.read(filename)
end
def find_start_of_packet
idx = 0
@data.chars.each_cons(4) do |chunk|
return idx + 4 if chunk.uniq == chunk
idx += 1
end
'No packet found'
end
end
TuningTrouble.new(ARGV[0] || 'input.txt').run
So, caveat emptor. This version isn't the finished version. Rather, it is the first version. The simplest, most concise version. It's also the slowest. If you look at the code in the repository, you will see all of the variations that are discussed below, commented out. You can try any of them just by uncommenting the relevant methods.
In a celebration of simplicity, today's task has the simplest data parsing possible. i.e. the data in the data file is already ready to use without any other changes, so there is nothing to talk about regarding the parsing, so let's look at the solution:
def find_start_of_packet
idx = 0
@data.chars.each_cons(4) do |chunk|
return idx + 4 if chunk.uniq == chunk
idx += 1
end
'No packet found'
end
This is nicely concise. Ruby has a method, String#chars
that can be used to get an array of characters from a string. There is also a method, Enumerable#each_cons
, which is mixed into Array, which returns an enumerator that iterates over a sliding window of elements. So, in this example, each_cons(4)
returns an iterator that will iterate over all 4 character windows of the original string. Because those windows are arrays, the uniq
method can be used to eliminate any duplicated characters. If the result of calling that method equals the original chunk, then the start-of-packet marker has been found.
This version is safe for the full UTF-8 character set, but it is quite slow, taking about 17.7 seconds on my laptop, using Ruby 3.2.0 +YJIT. If @data.chars.each_cons
is changed to @data.bytes.each_cons
, making the code safe only for ASCII inputs, performance jumps to about 7.2 seconds.
Another variation that is nearly as concise, but faster, is this one.
def parse_packet_data(filename)
File.read(filename)
end
def no_duplicate_characters?(chars)
chars.split('').uniq.size == chars.size
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(@data[idx, 4])
end
'No packet found'
end
Because the sliding window is 4 characters long, it can take no more than @data.size - 4
iterations to find the starting packet if one exists. Inside that loop, return the idx +4
if no duplicate characters are found in the sliding window that is composed of the 4 characters starting at idx
. This version eliminates the use of each_cons
, which also eliminates manually tracking the actual position of the sliding window, and just directly implements it with a loop and a string slide.
For ease of reading, I moved the check for duplicate characters into a separate method. This doesn't notably effect the Ruby solution execution times, as other things have so much more effect, and it makes them more readable.
def no_duplicate_characters?(chars)
chars.split('').uniq.size == chars.size
end
This code splits the character string into an array of individual characters, then runs it through uniq
to eliminate any duplicated characters, and compares to see if the result is the same size as the original.
For the data set that this task is working with, speed is largely irrelevant, but since the solution for this day is so much simpler than in past days, I'm going to take a closer look at performance.
I created an input file that has 4 megabytes of guaranteed repetitions in every 4-character window, followed by the contents of the original input file. On that set, this original solution finishes in about 19 seconds on my laptop.
Since our data set is composed only of ASCII characters, this can be improved.
def no_duplicate_characters?(chars)
chars.bytes.uniq.size == chars.size
end
The String#bytes
method returns an array of the bytes that make up the string. Since the input data is all ASCII, this is safe (ASCII characters are all single-byte characters). Because the implementation doesn't have to worry about multi-byte characters, this is much faster to execute than the split('')
call, and in truth, this is pretty close to as fast as you can make this in Ruby.
The second version (chars.bytes.uniq.size == chars.size
) finished in about 8.7 seconds on my laptop.
Can that be improved? What if the method that checks for duplicate characters is just a big boolean
statement that explicitly checks the options?
def no_duplicate_characters?(chars)
!( chars[0] == chars[1] ||
chars[0] == chars[2] ||
chars[0] == chars[3] ||
chars[1] == chars[2] ||
chars[1] == chars[3] ||
chars[2] == chars[3] )
end
That version runs in about 8.4 seconds, and it works with all UTF-8 characters, too!
Can it be faster? Yes. If parse_packet_data
is revisited, and it is modified to return an array of bytes instead of a string, the speedup is substantial.
def parse_packet_data(filename)
File.read(filename).bytes
end
This version is what you will find in the repo (with comments explaining the other version). It runs in about 2.2 seconds on my laptop.
Ruby's Fastest Version
There is one final improvement that can be made to the Ruby version. All of the previous versions created new strings or arrays with every call into no_duplicate_characters?
. Apart from the obvious performance hit from duplicating data that already exists somewhere else, object creation and cleanup are both expensive in Ruby, so creating large numbers of redundant, short-lived objects impacts performance. However, because all of the data that is required is directly accessible in the @data
instance variable, this can be dealt with by just passing an index into no_duplicate_characters?
instead of a string or array of bytes.
def no_duplicate_characters?(idx)
!(@data[idx] == @data[idx + 1] ||
@data[idx] == @data[idx + 2] ||
@data[idx] == @data[idx + 3] ||
@data[idx + 1] == @data[idx + 2] ||
@data[idx + 1] == @data[idx + 3] ||
@data[idx + 2] == @data[idx + 3])
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(idx)
end
'No packet found'
end
This version is running in about 1.32 seconds on my laptop.
So, how do the other languages stack up?
Crystal solution (winner for most concise)
Part of Crystal's awesomeness is that it has the same sort of productive, expressive syntax as Ruby, but it provides a fantastic type system and creates fast-executing code, as well. Consider:
class TuningTrouble
@data : String
def initialize(filename)
@data = parse_packet_data(filename)
end
def run
puts "The start of packet marker is at character ##{find_start_of_packet}"
end
def parse_packet_data(filename)
File.read(filename)
end
def find_start_of_packet
@data.chars.each_cons(4, reuse: true).index! {|chunk| chunk.uniq == chunk} + 4
end
end
TuningTrouble.new(ARGV[0]? || "input.txt").run
Crystal wins the concise code battle. This is spiritually idential to the original Ruby solution before we started trying to find faster versions, but it is even more terse! The definition of Array
in Crystal mixes in Indexable
, which has a method, Indexable#index
that returns the index of the first object for which the block returns a truthy value. The index!
version just returns an exception if no index is found. This eliminates manual counting of that index and makes this a very terse solution. Now, recall that the Ruby version of the above code takes about 17.7 seconds.
Crystal does it in about 1.27 seconds, and it is safe for the full UTF-8 character set. In fact, changing chars
to bytes
, in this example, offers no benefits. The next few examples show other fairly terse versions, but none are as fast as this until we get to the more complicated versions.
@data : String
def parse_packet_data(filename)
File.read(filename)
end
def no_duplicate_characters?(chars)
chars.split("").uniq.size == chars.size
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(@data[idx, 4])
end
"No packet found"
end
This one does manual window management. It is easy to understand, but it is slower than the more concise version, at about 5.5 seconds.
Now look at this one. It turns out that a solution that I didn't even mention in the Ruby section because it is so slow (43 seconds on the 4mb input file) is faster in Crystal:
def no_duplicate_characters?(chars)
chars.chars.to_set.size == chars.size
end
This is tremendously slow in Ruby, but in Crystal, it takes Ruby's 43 seconds down to about 3.9 seconds, and it's pretty terse, too!
What about this change?
def no_duplicate_characters?(chars)
chars.bytes.uniq.size == chars.size
end
This works on Crystal just like in Ruby, and performance increases to about 2.4 seconds.
Of course, the fastest Ruby versions use that big boolean statement. So what happens if Crystal is changed accordingly?
def no_duplicate_characters?(chars)
!( chars[0] == chars[1] ||
chars[0] == chars[2] ||
chars[0] == chars[3] ||
chars[1] == chars[2] ||
chars[1] == chars[3] ||
chars[2] == chars[3] )
end
This version runs in about 0.92 seconds on Crystal and is UTF-8-safe.
Recall that the fastest Ruby version got its final speedup by indexing directly into the main buffer to avoid the creation of large numbers of small objects. What happens with the Crystal version if that same change is made?
def no_duplicate_characters?(idx)
!( @data[idx] == @data[idx + 1] ||
@data[idx] == @data[idx + 2] ||
@data[idx] == @data[idx + 3] ||
@data[idx + 1] == @data[idx + 2] ||
@data[idx + 1] == @data[idx + 3] ||
@data[idx + 2] == @data[idx + 3] )
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(idx)
end
"No packet found"
end
Execution time drops to around 0.465 seconds.
Now, what happens if we make changes so that it only works with ASCII input data? This will require changing the type of the @data
instance variable, and a small change to the parse_packet_data
method.
@data : Bytes
def parse_packet_data(filename)
File.read(filename).to_slice
end
def no_duplicate_characters?(idx)
!( @data[idx] == @data[idx + 1] ||
@data[idx] == @data[idx + 2] ||
@data[idx] == @data[idx + 3] ||
@data[idx + 1] == @data[idx + 2] ||
@data[idx + 1] == @data[idx + 3] ||
@data[idx + 2] == @data[idx + 3] )
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(idx)
end
"No packet found"
end
The String#to_slice
method returns a read-only Slice
of bytes, which Crystal provides an alias for, Bytes
, which provides direct access to the memory buffer which contains the String
data.
This one change takes the execution time down to about 0.0585 seconds. The decimal point is in the right place. That is about 23x faster than the fastest Ruby version, with code that isn't significantly different from what Ruby is doing. Certainly, that is as fast as Crystal can get, right?
As it turns out, no. Crystal has a few more tricks.
def no_duplicate_characters?(idx)
seen = StaticArray(UInt8, 4).new(@data[idx])
step = 1
while step < 4
if seen.includes?(@data[idx + step])
return false
end
seen[step] = @data[idx + step]
step += 1
end
true
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(idx)
end
"No packet found"
end
This version might take a little explanation. A StaticArray
is an array that is allocated on the stack, instead of on the heap. The effect of this is that the allocation and deallocation of the memory that it uses are very fast. In Crystal, one can use a StaticArray
pretty much anywhere where the size of the array is fixed, and where that size is known at compile time. In this case, we know the maximum amount of previously-seen characters, so we can use a StaticArray
to very quickly allocate that space. Additionally, because the first byte to be checked can't match anything in the otherwise empty seen
array, the code can save the work of checking that first byte by seeding the seen
array with its value, and starting checks with the second byte.
It then iterates through the remaining three, checking to see if any of those characters are in the seen
array and it adds them if they are not.
This version is the fastest. It averages about 0.049 seconds on my laptop, to process the 4 MB benchmarking file.
Crystal has one more tool that developers can exploit, though. The libraries that make up the Crystal standard library are all open. This means that if there is something that a person wants to add or change, even if it is part of the core libraries, one can.
Crystal's Fastest Version
One of the challenges to making the Crystal solution faster is that while the data that is needed is all available in memory, every solution preceding the next one is forced to copy pieces of it to another area of memory. Copying data takes more time than not copying data. If there were a way to check a section of a Slice
to see if any of its elements match an argument, without copying anything or creating any new objects, the algorithm should be even faster.
struct Slice(T)
@[AlwaysInline]
def unsafe_includes_in_window?(offset, size, value)
size.times do |idx|
return true if self.unsafe_fetch(offset + idx) == value
end
false
end
end
And then, given that, here is the fastest Crystal version for this puzzle.
@data : Bytes
def parse_packet_data(filename)
File.read(filename).to_slice
end
def no_duplicate_characters?(idx)
step = 1
while step < 4
if @data.unsafe_includes_in_window?(idx, step, @data.unsafe_fetch(idx + step))
return false
end
step += 1
end
true
end
def find_start_of_packet
(@data.size - 4).times do |idx|
return idx + 4 if no_duplicate_characters?(idx)
end
"No packet found"
end
A few quick notes about this solution. The @[AlwaysInline]
directive is a hint to the compiler that it should inline uses of this method into the generated code. For something that will be called a lot, this can improve speed, and it seems to have a small but measurable effect with this code. Also, the Slice#unsafe_fetch
method is like Slice#[]
, but it doesn't check to make sure that the offset being fetched is within the limits of the data allocated to the Slice before trying to access the data, and it is also defined with @[AlwaysInline]
.
This version runs in a lightning-fast 0.0355 seconds, on average.
So how does Rust compare? (tl;dr -- it's neck and neck with Crystal, but the fastest Crystal version ends up just edging out the fastest Rust version).
Rust solution
In another first, the Rust solution is, overall, almost-ish as concise as the Ruby/Crystal versions this time!
use std::env;
use std::collections::HashSet;
struct TuningTrouble {
data: String,
}
impl TuningTrouble {
fn new(filename: &str) -> TuningTrouble {
let data = Self::parse_packet_data(filename).unwrap();
TuningTrouble { data }
}
fn parse_packet_data(filename: &str) -> Option<String> {
let data = std::fs::read_to_string(filename).ok()?;
Some(data)
}
fn run(&mut self) {
println!("The start of packet marker is at character #{}", self.find_start_of_packet())
}
fn no_duplicate_characters(&self, s: &[u8]) -> bool {
let mut set = HashSet::new();
for c in s {
set.insert(c);
}
set.len() == s.len()
}
fn find_start_of_packet(&mut self) -> usize {
self.data
.as_bytes()
.windows(4)
.position(|chunk| self.no_duplicate_characters(chunk))
.unwrap()
+ 4
}
}
fn main() {
let args: Vec<String> = env::args().collect();
let mut filename = "input.txt";
if let Some(arg) = args.get(1) {
filename = arg;
}
TuningTrouble::new(filename).run();
}
So, sneak preview, but just like with the two previous languages, this version is the slowest. Read on or peek at the code in the repository for the fastest version.
The details of this first version of the solution are found in the same two methods as in the other languages:
fn no_duplicate_characters(&self, s: &[u8]) -> bool {
let mut set = HashSet::new();
for c in s {
set.insert(c);
}
set.len() == s.len()
}
fn find_start_of_packet(&mut self) -> usize {
self.data
.as_bytes()
.windows(4)
.position(|chunk| self.no_duplicate_characters(chunk))
.unwrap()
+ 4
}
The find_start_of_packet()
method is really interesting. For reasons that are explained a little later, Rust doesn't support indexing into strings, which makes the creation of a sliding window of string data more tricky. There's work there for the developer. This solution gets around that, though, by calling as_bytes()
on the string, which returns an array of bytes.
That data structure does support creating sliding windows, via the windows()
method. And then, much like the index
method in Crystal, position()
iterates through each of those windows, and when the associated block returns a true value, it returns the position in the iterator at which that true value occurred.
This solution to detecting duplicate characters in a string uses a HashSet. In Rust, there is no method equivalent to Crystal or Ruby's uniq
function, which allows for a concise solution in those languages. To achieve a similar result in Rust, the method iterates over the characters in the string and inserts them into a HashSet
. Since a HashSet
only stores unique values, any duplicates will be automatically discarded. After all of the characters have been processed, the length of the HashSet
is compared to the length of the original string. Equality returns a true value as it indicates that no duplicated characters were discarded from the HashSet
.
This is concise but it does a lot of extra work in populating a fairly complex data structure (HashSet
) only to check its length. Thus, as the Rust solutions go, it is the slowest, at about 2.00 seconds. Can it be improved?
fn no_duplicate_characters(&self, s: &str) -> bool {
let mut chars = s.chars();
let mut seen = vec![];
while let Some(chr) = chars.next() {
if seen.contains(&chr) {
return false;
}
seen.push(chr);
}
true
}
This version builds a simple vec
of seen characters, similar to the fastest Crystal solution. The code is fairly clear. It uses a while let
loop to iterate through the characters, checking to see if they have been seen before, and returning false if they have been. Alternatively, if a given character has not been seen, it is pushed into the seen
vector, and the next one is checked.
Recall, though, that the vec
type is a dynamic array that is allocated on the heap. This would be better if it were built with a data structure that is faster to work with than a vec
. Also, there is another interesting consideration with Rust.
In both Ruby and in Crystal, indexing into a string operates at the character level. In either of those languages, if you have a string like ๐ฅ๐๐ฆ
you can correctly index into it:
tag = "๐ฅ๐๐ฆ"
puts tag[1..2]
This will result in an output of ๐๐ฆ
. If you try that in Rust, though, you will get a runtime error.
fn main() {
let tag = String::from("๐ฅ๐๐ฆ");
println!("{:?}", &tag[1..2]);
}
thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '๐ฅ' (bytes 0..4) of `๐ฅ๐๐ฆ`', junk.rs:4:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
This is because, while all three languages use UTF-8 strings, Rust doesn't permit one to index into them. Attempting to access a single index instead of a range isn't even allowed at the compilation level. A line like println!("{:?}", &tag[2]);
would fail to even compile.
This poses interesting challenges. In Ruby and in Crystal, it is safe, and will not error with any UTF-8 characters, to do something like call_method_foo(data[idx..idx+4])
. In Rust, though, that fails. So, if one wants to implement a solution that can handle all UTF-8 strings, what is one to do? Is it solvable to handle that issue and make our solution more like the fastest Crystal one, all at the same time?
fn sliding_windows<'a>(&self, data: &'a str, size: usize) -> impl Iterator<Item = &'a str> {
data.char_indices().flat_map(move |(from, _)| {
data[from..]
.char_indices()
.skip(size - 1)
.next()
.map(|(to, chr)| &data[from..from + to + chr.len_utf8()])
})
}
fn no_duplicate_characters(&self, s: &str) -> bool {
let chars = s.chars();
let mut seen = ['\x00', '\x00', '\x00', '\x00'];
let mut step = 0;
for char in chars {
if seen.contains(&char) {
return false;
}
seen[step] = char;
step += 1;
}
true
}
fn find_start_of_packet(&mut self) -> usize {
let mut idx = 0;
for window in self.sliding_windows(&self.data, 4) {
if self.no_duplicate_characters(window) {
return idx + 4;
}
idx += 1;
}
0
}
Yes. The tricky bit is that Rust makes the developer come up with their own iterator implementation for sliding windows that honor character boundaries. The only way that I could figure out to do that was to use char_indices()
to do the dirty work. If there is a better way, I would love to know about it.
Leveraging a sliding window iterator, the other changes can be seen in no_duplicate_characters()
. That method now uses an array instead of a vec
for the seen buffer, which is much more lightweight.
This version is UTF-8 safe, and also much faster, finding the correct answer from our 4 MB input file in about 0.2 seconds.
This performance is still substantially slower than the fastest Crystal version, but there are a couple of tricks still available for this Rust solution. Recall that in the Ruby and the Crystal versions, eliminating data copying by just passing an index to refer to the original data resulted in a significant speedup. Let's do that in the Rust version (and get rid of the laboriously created UTF-8-safe sliding windows code) and also try to use the boolean block approach.
fn no_duplicate_characters(&self, idx: usize, bytes: &[u8]) -> bool {
!(bytes[0] == bytes[1]
|| bytes[0] == bytes[2]
|| bytes[0] == bytes[3]
|| bytes[1] == bytes[2]
|| bytes[1] == bytes[3]
|| bytes[2] == bytes[3])
}
fn find_start_of_packet(&mut self) -> usize {
self.data
.as_bytes()
.windows(4)
.position(|window| self.no_duplicate_characters(0, window))
.unwrap()
+ 4
}
This is very similar to one of the fastest Crystal versions. The differences in performance are obscured by the background noise of my laptop's operation, but benchmarking put this code at about 0.049 seconds, which is just a bit faster than Crystal's 0.0585 seconds for a similar implementation.
So what happens if I go back to the version that builds an array of seen characters, but have it work on bytes instead of characters?
fn no_duplicate_characters(&self, bytes: &[u8]) -> ool {
let mut seen = [bytes[idx], 0, 0, 0];
let mut step = 1;
while step < 4 {
if seen.contains(&bytes[idx + step]) {
return false;
}
seen[step] = bytes[idx + step];
step += 1;
}
true
}
fn find_start_of_packet(&mut self) -> usize {
self.data
.as_bytes()
.windows(4)
.position(|window| self.no_duplicate_characters(window))
.unwrap()
+ 4
}
This version chews through data impressively quickly, benchmarking at 0.0427 seconds for the 4mb test file.
Rust's Fastest Version
Believe it or not, the best Rust solution is a little faster, and bucking the trend, is also actually Rust's most concise solution.
fn find_start_of_packet(&mut self) -> usize {
self.data
.as_bytes()
.windows(4)
.position(|window| {
window
.iter()
.enumerate()
.all(|(idx, c)| !window[..idx].contains(c))
})
.unwrap()
+ 4
}
This one might need some explanation of the code that determines if the characters are unique. An iterator is created from the sliding window, and then it is enumerated, which means that each of the characters in the window will be iterated over accompanied by their index in the window. The invocation of all()
means that the block that follows must return true for every iteration of its closure. That closure returns true if none of the unseen characters in the window match the current character being iterated. If none match, then all()
will return true, position()
will report the position, and when 4
is added to it, the start-of-packet marker has been found.
This version runs in about 0.0356 seconds, which basically means that with the 4 byte window, the performance of Crystal vs Rust is indistinguishable.
Whew! That was a lot of performance optimization with those different versions!
Part 2
It seems that the communication system on your device is correctly detecting packets, but it still isn't working. It looks like it needs to also search for messages. Start-of-message markers are similar to start-of-packet markers, but they consist of 14 distinct characters instead of 4.
The good thing here is that we aren't doing anything different in this second part of the solutions. All that is necessary is to identify a much larger window -- 14 characters -- instead of the much smaller 4-character window. For all of the fastest solutions, this change is as simple as changing a few numbers in the implementation, however. Let's see how they hold up.
The Approach
The approach is fundamentally no different than in part 1.
Ruby solution
# frozen_string_literal: true
class TuningTrouble
def initialize(filename)
@data = parse_packet_data(filename)
end
def run
puts "The start of packet marker is at character ##{find_start_of_packet}"
end
def parse_packet_data(filename)
File.read(filename).bytes
end
def no_duplicate_characters?(idx, window = 14)
pos = 0
while pos < window - 1
comp_pos = pos + 1
while comp_pos < window
return false if @data[idx + pos] == @data[idx + comp_pos]
comp_pos += 1
end
pos += 1
end
true
end
def find_start_of_packet
(@data.size - 14).times do |idx|
return idx + 14 if no_duplicate_characters?(idx)
end
'No packet found'
end
end
TuningTrouble.new(ARGV[0] || 'input.txt').run
This is the fastest Ruby solution. It completes the task on the large file in about 1.45 seconds.
Instead of having a single large boolean statement, it works by doing single comparisons inside of two tight loops. The outer loop iterates through the first through the 13th byte and checks if any of the characters that follow it are the same. If any are, then there are duplicated characters, so the method should return false.
def no_duplicate_characters?(idx, window = 14)
pos = 0
while pos < window - 1
comp_pos = pos + 1
while comp_pos < window
return false if @data[idx + pos] == @data[idx + comp_pos]
comp_pos += 1
end
pos += 1
end
true
end
If it completes the loops, then there are no duplicates, and true can be returned. The nice thing about this solution is that it is essentially free to make it generic. If a window size is passed into no_duplicate_characters?
, it will be used to determine the size of the search.
Crystal solution
struct Slice(T)
@[AlwaysInline]
def unsafe_includes_in_window?(offset, size, value)
size.times do |idx|
return true if self.unsafe_fetch(offset + idx) == value
end
false
end
end
class TuningTrouble
def initialize(filename)
@data = parse_packet_data(filename)
end
def run
puts "The start of packet marker is at character ##{find_start_of_packet}"
end
@data : Bytes
def parse_packet_data(filename)
File.read(filename).to_slice
end
def no_duplicate_characters?(idx)
step = 1
while step < 14
if @data.unsafe_includes_in_window?(idx, step, @data.unsafe_fetch(idx + step))
return false
end
step += 1
end
true
end
def find_start_of_packet
(@data.size - 14).times do |idx|
return idx + 14 if no_duplicate_characters?(idx)
end
"No packet found"
end
end
TuningTrouble.new(ARGV[0]? || "input.txt").run
The only changes are to change the 4
to 14
in a couple of places.
So, how fast is it when looking for a 14 character window instead of a 4 character window?
Truthfully, I can't measure any consistent difference. It tends to find the answer in the 4mb benchmarking file in about 0.0355 seconds.
So, what about Rust?
Rust solution
use std::env;
struct TuningTrouble {
data: String,
}
impl TuningTrouble {
fn new(filename: &str) -> TuningTrouble {
let data = Self::parse_packet_data(filename).unwrap();
TuningTrouble { data }
}
fn parse_packet_data(filename: &str) -> Option<String> {
let data = std::fs::read_to_string(filename).ok()?;
Some(data)
}
fn run(&mut self) {
println!(
"The start of packet marker is at character #{}",
self.find_start_of_packet()
)
}
fn no_duplicate_characters(&self, idx: usize, bytes: &[u8], seen: &mut [u8; 14]) -> bool {
seen[0] = bytes[idx];
for i in 1..14 {
seen[i] = 0;
}
let mut step = 1;
while step < 14 {
if seen.contains(&bytes[idx + step]) {
return false;
}
seen[step] = bytes[idx + step];
step += 1;
}
true
}
fn find_start_of_packet(&mut self) -> usize {
let bytes = self.data.as_bytes();
let mut seen = [0_u8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
for idx in 0..(self.data.len() - 14) {
if self.no_duplicate_characters(idx, &bytes, &mut seen) {
return idx + 14;
}
}
0
}
}
fn main() {
let args: Vec<String> = env::args().collect();
let mut filename = "input.txt";
if let Some(arg) = args.get(1) {
filename = arg;
}
TuningTrouble::new(filename).run();
}
Et voila! The Rust version follows a very similar pattern to the Crystal version, creating a reusable buffer that is cleared on every use instead of allocating a new one and expanding the buffer to 14 bytes.
This version is vastly slower than the Crystal one, clocking in at about 0.081 seconds, on average.
Conclusion
Whew! This was an interesting deep dive into how implementation affects performance, as well as some techniques that can be used to achieve better performance. The results of Ruby vs Rust should surprise nobody. The Ruby solutions were very programmer-friendly to write, but the best-performing Ruby solution was still vastly slower than the best-performing Rust solution.
What I think may have come as a surprise to some people is the Crystal performance. The Crystal solutions that were essentially identical to the Ruby versions were always much faster than the equivalent in Ruby, and sometimes vastly faster. However, the nature of Crystal means that it also offers other options that aren't available in Ruby. By paying careful attention to the implementation and to the way that memory was used, the fastest Crystal versions delivered shockingly fast performance.
TL;DR -- Jump here for a quick summary
if you jumped straight down here, I tested these with an artificially bloated input file that contained 4 MB of data from the portion of the puzzle input that didn't contain the solution, followed by the full puzzle input again.
Versions used:
Ubuntu 22.04 on an Intel i9-12900HK based laptop (2022 Dell XPS 15)
Ruby 3.2.0 +YJIT
Crystal 1.7.0-dev [292052bb2] (2023-01-04)
rustc 1.68.0-nightly (388538fc9 2023-01-05)
In Part 1:
Ruby's best-performing version took 1.32 seconds on average.
Crystal's best-performing version took 0.0355 seconds, on average.
This seems to be a measurement limitation with the shelltime
and the minimum amount of time necessary to simply execute a Crystal program. Internal timing shows around a 0.015 second runtime for the actual solution....Rust's best-performing version took 0.0356 seconds, on average.
This is likely also a result of the minimum amount of time necessary to simply start an executable on my system, which is why it was essentially identical to the Crystal timings.Crystal's most concise version was shorter than even Ruby's most concise version.
In Part 2:
Ruby's best-performing version took 1.45 seconds on average.
Crystal's best-performing version took 0.0355 seconds on average.
This seems to be a measurement limitation with the shelltime
and the minimum amount of time necessary to simply execute a Crystal program. Internal timing shows around a 0.015 second runtime for the actual solution....Rust's best-performing version took 0.081 seconds on average.
Rust is a justifiably exciting language, offering a phenomenally expressive language for a system programming langauge while at the same time providing mechanisms that make it pretty difficult to do truly hazardous things with memory management, and wrapping it all up with a compiler that produces very fast-executing code.
However, Rust's expressiveness can not match Crystal's. Even the easiest things are generally a lot more finicky to build with Rust than with Crystal. A complaint that I have heard in some circles is how expensive Rust development can be, because of this. This is not a knock on Rust, but rather, is speculation that there are many people who are using Rust because its compiled code runs quickly, but that Rust may not be the right tool for all of those things that it is being used for.
A very expressive language like Crystal, which is very programmer friendly along with providing robust type safety guarantees and the flexibility to tightly tune performance where it is needed could be an ideal tool to for many projects. If Crystal has thus far flown under your radar, maybe it's time to check it out?