Post

I Wrote Ruby and the Compiler Made It 10x Faster

I Wrote Ruby and the Compiler Made It 10x Faster

I wrote a log analyzer in Ruby. Then I compiled it to a native binary. It ran 10 times faster than the same algorithm in CRuby, and 3 times faster than idiomatic Ruby with regex and each.

The part that surprised me most: the idiomatic Ruby version – the one with regex parsing and sort_by and Hash.new(0) – was already 4.7x faster than the while-loop version running in CRuby. The interpreter really doesn’t like character-by-character string walking. But once compiled through Spinel, that same while-loop code becomes the fastest of all.

Same language. Same output. The compiler just made it fast.

How I Got Here

I was looking at a project called Spinel. It’s an AOT compiler for Ruby – you feed it .rb files and it produces a standalone native binary. No interpreter. No VM. No runtime dependency.

The idea: parse the Ruby source into an AST, run whole-program type inference to figure out what every variable and method actually is, then emit C code that the system compiler turns into a binary.

What caught my attention was the benchmark table. Spinel claims ~7.8x geometric mean speedup over Ruby 4.0 with YJIT across 28 benchmarks. The mandelbrot benchmark is 55x faster. N-queens is 34x faster.

“Okay, but what about real code?” Mandelbrot is pure integer math in a tight loop. That’s compiler catnip. What about something that does actual work – string parsing, hash lookups, formatting, I/O?

So I built something real. Two versions of it, actually.

The Program

I wrote an access log analyzer. The kind of tool you’d actually use to make sense of an nginx or Apache log file. It parses Combined Log Format lines, tallies up statistics, and prints a report.

Combined Log Format looks like this:

1
10.0.0.1 - - [21/May/2026:08:30:00 +0000] "GET /api/users HTTP/1.1" 200 42857 "-" "Mozilla/5.0"

The analyzer extracts the host, HTTP method, URL path, status code, response size, and timestamp from each line. Then it computes:

  • Total requests, bandwidth, error/redirect rates
  • Status code distribution
  • HTTP method breakdown
  • Top URLs by hits and bandwidth
  • Hourly traffic histogram
  • Top requesting hosts

Real work. String parsing. Hash lookups. Sorting. Formatting. The kind of program where you’d normally reach for Python or Go because Ruby “is too slow for data processing.”

Two Versions, Same Output

I wrote two versions of the same program. Both produce byte-identical output.

Version 1: The Spinel Subset

This is the version that compiles through Spinel. While-loops, character-by-character string walking, explicit hash operations – because that’s what the type inference engine can work with.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class LineParser
  def initialize
    @host = ""
    @http_verb = ""
    @url = ""
    @code = 0
    @size = 0
    @timestamp = ""
    @valid = 0
  end

  attr_reader :host, :http_verb, :url, :code, :size, :timestamp, :valid

  def parse(line)
    @valid = 0
    return if line.length < 20

    # Extract host (up to first space)
    sp = 0
    while sp < line.length && line[sp] != " "
      sp = sp + 1
    end
    return if sp == 0
    @host = line[0, sp]

    # ... character-by-character extraction continues
    @valid = 1
  end
end

Notice what’s missing: no regex. No split. No match. I’m walking the string character by character because that’s what Spinel’s type system wants. Every variable has a known type. Every method returns a known type. The compiler tracks this through the whole program.

Version 2: Idiomatic Ruby

This is the version a Ruby developer would actually write. Regex parsing. Struct for data. Hash.new(0). sort_by. each. The kind of code that makes Ruby pleasant.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
LOG_PATTERN = %r{
  ^
  (\S+)               \s+   # host
  \S+                 \s+   # ident
  \S+                 \s+   # user
  \[([^\]]+)\]        \s+   # date
  "(\S+)\s+(\S+)\s+(\S+)" \s+  # method path proto
  (\d{3})             \s+   # status
  (\d+)               \s*   # bytes
}x

Line = Struct.new(:host, :http_verb, :url, :code, :size, :timestamp)

def parse_line(line)
  m = line.match(LOG_PATTERN)
  return nil unless m
  Line.new(m[1], m[3], m[4], m[6].to_i, m[7].to_i, m[2])
end

# Tallying
status_counts = Hash.new(0)
verb_counts = Hash.new(0)
url_counts = Hash.new(0)

lines.each do |line|
  entry = parse_line(line)
  next unless entry

  status_counts[entry.code.to_s] += 1
  verb_counts[entry.http_verb] += 1 unless entry.http_verb.empty?
  url_counts[entry.url] += 1 unless entry.url.empty?
end

# Reporting
def top_n(hash, n)
  hash.sort_by { |_, v| -v }.first(n)
end

Clean. Readable. The kind of code you’d commit without a second thought. And it won’t compile through Spinel – too much dynamic dispatch, nil returns from parse_line, polymorphic sort_by with block, Hash.new(0) with default values.

That’s the tradeoff. Idiomatic Ruby is beautiful. The Spinel subset is fast. But how much faster?

The Constraint That Taught Me the Most

Type inference rejecting nil returns vs the @valid flag fix

The first version didn’t compile. I wrote what I thought was normal Ruby and Spinel rejected it.

Here’s what I did wrong:

1
2
3
4
5
def parse_line(line)
  return nil if line.length < 20
  # ... parse ...
  LogEntry.new(host, method, path, status, bytes, date)
end

Two problems. First, returning nil from a method that also returns a class instance – the type inference needs one return type at compile time, and a “class or nil” union wasn’t resolving. Second, I named a variable method – which collides with Ruby’s method method, so the type inference thought the variable was a Method object instead of a string.

The fix was to change the design:

1
2
3
4
5
6
def parse(line)
  @valid = 0
  return if line.length < 20
  # ... parse, setting instance variables ...
  @valid = 1
end

Instead of returning nil or an object, the parser sets a @valid flag. The caller checks lp.valid == 1. No polymorphic return. No nil. The type is always the same – the parser object itself, with its state changed.

This is the most interesting part of working with an AOT compiler: it changes how you write code. Not because the language is different – it’s still Ruby – but because you’re thinking about types. Not “what type should I declare?” but “what type does this actually have at runtime, and can the compiler prove it?”

The Statistics Engine

The stats accumulator in the Spinel version is straightforward Ruby with hashes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class Stats
  def initialize
    @total = 0
    @errors = 0
    @redirects = 0
    @total_bytes = 0
    @status_counts = {}
    @verb_counts = {}
    @url_counts = {}
    @url_bytes = {}
    @hour_counts = {}
    @host_counts = {}
  end

  def record(lp)
    @total = @total + 1

    s = lp.code
    if s >= 500
      @errors = @errors + 1
    elsif s >= 300 && s < 400
      @redirects = @redirects + 1
    end

    sk = s.to_s
    if @status_counts.has_key?(sk)
      @status_counts[sk] = @status_counts[sk] + 1
    else
      @status_counts[sk] = 1
    end
    # ... similar for URLs, verbs, hours, hosts ...
  end
end

When Spinel compiles this, those {} literals become sp_StrIntHash_new() calls – a C struct with open-addressing hash tables backed by string hashing. The has_key? / [] / []= calls become inline C function calls that operate on the struct directly. No method dispatch. No hash method lookup. No dynamic type checking.

That’s where the speed comes from. Not making Ruby faster – removing the Ruby runtime entirely.

Compiling

Spinel compilation pipeline: Ruby to AST to IR to C to native binary

1
./spinel spinel_nedim/log_analyzer.rb -o log_analyzer

That’s it. One command. The spinel wrapper runs three stages:

  1. spinel_parse – parses the Ruby source using Prism, serializes the AST
  2. spinel_analyze – whole-program type inference, produces an IR file
  3. spinel_codegen – consumes AST + IR, emits C code, pipes it to cc

The final binary is standalone. No Ruby installation needed. No shared libraries beyond libc and libm.

The Moment of Truth

I ran the compiled binary against 10,000 synthetic log entries:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
========================================
  ACCESS LOG REPORT
========================================

Total requests:  10000
Total bytes:     239 MB
Errors (5xx):    953 (9.5%)
Redirects (3xx): 2741 (27.4%)

--- Status Codes ---
  200: 5401 (54.0%)
  500: 953 (9.5%)
  302: 936 (9.3%)
  ...

--- Requests by Hour ---
  00:00 |######################################| 422
  01:00 |####################################| 397
  ...
  23:00 |####################################| 396

--- Top 10 Hosts ---
  3750  10.0.0.1
  1250  10.0.0.2
  ...
========================================

Then I ran both Ruby files with CRuby. Identical output. Byte for byte.

Then I wrote a proper benchmark.

The Benchmark

Benchmark results across 5 variants processing 100K log entries

I generated a 100K-line Combined Log Format file (9.3 MB) and ran all five variants: 15 iterations each, 5 warmup runs, median time reported.

Variant Best Median vs Idiomatic CRuby
Spinel native binary 120ms 122ms 3.2x faster
CRuby + YJIT (idiomatic) 325ms 335ms 1.2x faster
CRuby (idiomatic Ruby) 375ms 386ms 1.0x (baseline)
CRuby + YJIT (while-loop subset) 1,266ms 1,293ms 0.3x
CRuby (while-loop subset) 1,785ms 1,804ms 0.2x

Ruby 4.0.1 +PRISM, x86_64-linux. All variants processing the same 100K-line file.

Same algorithm, different runtimes: The while-loop subset in CRuby takes 1,804ms. The exact same code through Spinel: 122ms. 14.8x faster. Same loops. Same hash operations. Same string slicing. The runtime just doesn’t exist anymore.

Different algorithms, same language: Idiomatic Ruby in CRuby (386ms) is 4.7x faster than the while-loop version (1,804ms). Regex parsing and Hash.new(0) are C inside CRuby – fast even in the interpreter. The character-by-character while-loop code is pure Ruby bytecode, and the interpreter overhead is brutal.

But compile that “slow” code through Spinel: 1,804ms to 122ms. Now it’s 3.2x faster than the idiomatic version. The slowest code in the interpreter became the fastest after compilation. The interpreter penalized the explicit code. The compiler rewarded it.

YJIT helps the while-loop code more than the idiomatic code. While-loop: 1,804ms to 1,293ms with YJIT (1.4x). Idiomatic: 386ms to 335ms (1.2x). Makes sense – YJIT optimizes hot Ruby bytecode, and the while-loop code has way more bytecode to optimize. The idiomatic code already spends most of its time in C-implemented builtins (regex, sort_by), which YJIT can’t speed up.

What the Compiler Actually Does

CRuby method dispatch chain vs Spinel direct C function call

The speedup isn’t magic. It’s what happens when you remove indirection.

In CRuby, every method call goes through:

  1. Look up the method on the receiver’s class
  2. Check if the method has been redefined
  3. Check if the receiver is a special type (integer, string, etc.)
  4. Dispatch to the C implementation or the interpreted bytecode

Spinel skips all of that. After type inference, the compiler knows:

  • lp is always a LineParser instance – so lp.parse(...) becomes a direct C function call sp_LineParser_parse(lp, line)
  • @status_counts is always a StrIntHash – so @status_counts.has_key?(sk) becomes sp_StrIntHash_has_key(self->iv_status_counts, sk), an inline open-addressing hash lookup
  • lp.code is always an integer – so lp.code >= 500 is a plain C integer comparison, no type checking

The type inference is whole-program. It runs to a fixpoint – iterating over the AST until every variable, parameter, return type, and instance variable stabilizes. Then it writes that into an IR file, and codegen reads the IR to emit C.

If inference can’t resolve something – a variable that could be a string or integer depending on runtime – the compiler emits a “poly” path with runtime type checks. But the goal is to resolve as much as possible statically, so the generated C is close to what a C programmer would write by hand.

The Paradox of Idiomatic Ruby

The paradox: slowest in interpreter becomes fastest after compilation

There’s something uncomfortable in these numbers. The idiomatic Ruby version – regex, sort_by, each – is faster in CRuby than the explicit while-loop version. Not because idiomatic Ruby is inherently better. Because CRuby’s builtins are written in C, and anything that delegates to a C builtin wins. Regex matching is C. sort_by is C. Hash.new(0) with increment is C.

The while-loop code is pure Ruby. It does the same work – scanning characters, looking up hash keys, incrementing counters – but every operation goes through method dispatch, type checking, and bytecode interpretation.

The interpreter punished me for being explicit.

Spinel flips that around. When the whole program is compiled to C, there are no “builtins” vs “user code.” Everything is native. The while-loop code that was slow in the interpreter becomes direct C – character comparisons are line[sp] != ' ', hash lookups are sp_StrIntHash_has_key(), integer increments are self->iv_total++. No dispatch. No type checking. Just the operations.

The idiomatic version can’t compile through Spinel, so it never gets this treatment. It’s stuck at the speed of CRuby’s C builtins – fast, sure, but still going through the interpreter’s method dispatch to reach them.

The slowest code in the interpreter became the fastest after compilation. That’s the paradox. Write what the interpreter wants, you get interpreter speed. Write what the compiler wants, the compiler gives you C speed.

Then I Tried a Real Log File

My nginx access log had 34 entries (it’s a quiet server):

1
./log_analyzer /var/log/nginx/access.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
========================================
  ACCESS LOG REPORT
========================================

Total requests:  34
Total bytes:     4 KB
Errors (5xx):    0 (0.0%)
Redirects (3xx): 0 (0.0%)

--- Status Codes ---
  404: 34 (100.0%)

--- HTTP Methods ---
  GET: 34 (100.0%)

--- Top 15 URLs by Hits ---
  34  /fpm-status?json

--- Requests by Hour ---
  08:00 |#############| 5
  20:00 |##################| 7
  21:00 |########################################| 15
  22:00 |##################| 7

--- Top 10 Hosts ---
  34  127.0.0.1
========================================

Every request is a 404 from localhost hitting /fpm-status?json. That’s my health check. The log analyzer parsed it correctly from a real Combined Log Format file.

What I Had to Unlearn

Writing Ruby for an AOT compiler requires unlearning some habits:

1. Nil is not your friend. In CRuby, nil is everywhere. Methods return nil when they fail. nil is the default hash value. nil is the result of puts. In Spinel’s subset, returning nil from a method that also returns objects creates a type union that complicates inference. The fix: use sentinel values, validity flags, or default empty objects.

2. Method names matter more than you think. Naming a variable method or path can shadow Ruby builtins. In CRuby this is harmless – the local variable wins. In Spinel, the type inference sees method and thinks “this is the method method called on something,” not “this is a local string variable.” The fix: use more specific names (http_verb, url).

3. Duck typing doesn’t survive compilation. If you call .length on something that could be a string or an array depending on the branch, the compiler can’t emit a single C expression for that. It either picks one (and warns you) or falls back to a poly dispatch path. The fix: make sure every variable has one clear type, or be explicit about the types.

4. each with blocks works, but while is faster. Spinel supports each with blocks, but the block dispatch adds overhead in the generated C. A while loop with integer indexing compiles to a tighter C loop. For hot paths, I switched to while.

5. Don’t optimize for the interpreter. In CRuby, you learn to reach for builtins because they’re implemented in C and fast. In Spinel, everything becomes C. The explicit while-loop code compiles just as well as a call to a builtin – sometimes better, because the compiler can see through the whole operation and optimize across boundaries. The “slow” code in the interpreter can be the fast code after compilation.

These aren’t limitations. They’re the same tradeoffs you make in any statically-typed language. The difference: you still get Ruby syntax and semantics while making them.

What I’d Use This For

3x faster than idiomatic Ruby. 15x faster than the same algorithm in CRuby. That’s the territory where you stop reaching for Go and start reaching for Ruby:

  • Log processing pipelines. Parse, filter, aggregate – the kind of thing you’d write in Go or Rust for speed, but would rather write in Ruby for readability.
  • Data ETL scripts. Transform CSV, JSON, or custom formats. The kind of one-off script that’s fast to write in Ruby but too slow at scale.
  • CLI tools. Standalone binaries with no runtime dependency. Ship one file, run it anywhere.
  • Batch processing. Any workload that runs in a loop over a large dataset where the interpreter overhead of CRuby dominates.

The tradeoff: you write a subset of Ruby. No send. No method_missing. No define_method. No monkey-patching. No regex in the compiled version. Spinel can’t compile code that depends on runtime metaprogramming.

Most CLI tools and data processing scripts don’t need metaprogramming though. They use classes, methods, hashes, arrays, strings, and integers. That’s exactly what Spinel compiles well.

What I Learned

  1. The interpreter punishes explicit code. The while-loop version was 4.7x slower than the idiomatic version in CRuby. Not because while-loops are slow – because the interpreter adds overhead to every single operation. Regex and builtins are fast in CRuby because they’re already C. Your Ruby code is slow because it’s still interpreted. The compiler removes that asymmetry.

  2. Type inference changes how you think about code. Not in the “I must annotate everything” way. More like “can the compiler prove what this variable is?” You start writing code that’s easier for the compiler to reason about – and it turns out that code is easier for humans too. Specific names. Clear types. No nil-shaped footguns.

  3. Removing the runtime is the optimization. JIT makes the hot path fast. AOT removes the cold path overhead – method dispatch, type checking, GC bookkeeping, frame allocation. When everything compiles to direct C function calls, there’s no “cold” code. Native speed from the first instruction.

  4. The Ruby subset is more capable than I expected. Classes, instance variables, attr_reader, File.open, each_line, ARGV, to_i, to_s, string indexing, has_key?, hash literals, array push and keys – enough to write real programs. Not everything, but enough.

  5. Deterministic output is worth designing for. The synthetic data generator uses a fixed-seed PRNG. Same input, same output. I diffed the Spinel binary against CRuby to verify correctness. If you’re building tools that process data, reproducibility is a debugging superpower.

  6. AOT makes Ruby a systems language. Not “Ruby can write an OS kernel.” More like “Ruby can write CLI tools that start instantly, use no runtime, and process data 3x faster than idiomatic Ruby.” That changes where Ruby fits.

The Code

The code is at github.com/neidiom/spinel under spinel_nedim/:

  • log_analyzer.rb – the Spinel-compiled while-loop version
  • log_analyzer_idiomatic.rb – idiomatic Ruby with regex
  • benchmark_log_analyzer.rb – the benchmark harness

Compile it yourself:

1
2
3
4
5
git clone https://github.com/neidiom/spinel.git
cd spinel
make deps && make
./spinel spinel_nedim/log_analyzer.rb -o log_analyzer
./log_analyzer /var/log/nginx/access.log

If you’ve ever wanted Ruby to be faster without rewriting in Go, I’d love to hear what you’d build. LinkedIn or GitHub. Let’s make Ruby fast.

This post is Copyrighted by the author.