Back

/ 6 min read

The curious case of Typed Array optmisation

Introduction

Recently I implemented fuzzy-search for my ledger application. The current version is very bare bones and heavily relies on Levenshtein distance with a few calculation to weed out irrelevant result.

The itch to optimise

Levenshtein distance is a fairly a simple algorithm and I was already using an optimised version called Wagner-Fischer Algorithm. It was fast enough for fairly large dataset (100-200 list items) and did not require. But I had a lot of time.

function getDistance(word1, word2) {
const matrix = Array(word1.length + 1).fill(0).map(() =>
Array(word2.length + 1).fill(0)
);
for (let row = 0; row < word1.length + 1; row++) {
matrix[row][0] = row;
}
for (let col = 0; col < word2.length + 1; col++) {
matrix[0][col] = col;
}
for (let row = 1; row < word1.length + 1; row++) {
for (let col = 1; col < word2.length + 1; col++) {
if (word1[row - 1] == word2[col - 1]) {
matrix[row][col] = matrix[row - 1][col - 1];
continue;
}
matrix[row][col] = Math.min(
matrix[row - 1][col] + 1,
matrix[row][col - 1] + 1,
matrix[row - 1][col - 1] + 1,
);
}
}
return matrix[word1.length][word2.length];
}

Your leetcode brained must be shouting right now seeing word1.length + 1 in the for loop.

This leetcode brained habit of mine is what actually triggered to benchmark and optimise the algorithm.

So I setup a test bench using Deno. And did an initial run of the initial version and the version where I pre-calculated the length.

These were the results.

benchavg (ns)minmaxp75p99runs
getDistance - original1006.921000.691039.871008.731039.8760
getDistance - pre-calculate length974.66964.591050.67974.851050.6762

I ran the test 3 times and consistently managed to save ~40 ns. Not great or something significant, but a small win.

This small win made me invest more time into optimising the algorithm. After researching and brainstorming for a while, I came up with the optimisation of using 2 strings instead of a 2-D array, turned out this was a go-to optimisation step if the end goal is just to find the edit distance between 2 strings.

Here’s what the code looks like

function getEditDistance2Strings(word1, word2) {
let current = Array(word1.length + 1).fill(0);
let buffer = Array(current.length).fill(0);
for (let col = 0; col < current.length; col++) current[col] = col;
for (let row = 1; row < word2.length + 1; row++) {
buffer[0] = row;
for (let col = 1; col < buffer.length; col++) {
if (word1[col - 1] == word2[row - 1]) {
buffer[col] = current[col - 1];
} else {
buffer[col] = Math.min(
buffer[col - 1] + 1,
current[col] + 1,
current[col - 1] + 1,
);
}
}
[buffer, current] = [current, buffer];
}
return current[word1.length];
}

The results speak for themselves.

benchavg (ns)minmaxp75p99runs
getDistance - original1006.921000.691039.871008.731039.8760
getDistance - pre-calculate length974.66964.591050.67974.851050.6762
getDistance - 2 strings355.41346.82360.55356.80359.54151

A 300% improvement in calculating the edit distance. Oh boy!

Stepping in with assumptions

After seeing the improvement from ~950ns to ~350ns, I became more geeked on optimisation. I quickly realised that I am only storing numbers in the array and array is an object in javascript. Also my words had a limit of 200 character length, so the distance stored can’t. be possibly more than that and also they can’t be < 0.

UintArray ringed in my brain and I quickly sprayed my hands on the keyboard and came up with a potential optimisation.

function getEditDistanceUintArray(word1, word2) {
const len1 = word1.length;
const len2 = word2.length;
let current = new Uint8Array(len1);
let buffer = new Uint8Array(len1);
for (let col = 0; col < len1; col++) current[col] = col;
for (let row = 1; row < len2; row++) {
buffer[0] = row;
for (let col = 1; col < len1; col++) {
if (word1[col - 1] == word2[row - 1]) {
buffer[col] = current[col - 1];
} else {
buffer[col] = Math.min(
buffer[col - 1] + 1,
current[col] + 1,
current[col - 1] + 1,
);
}
}
[buffer, current] = [current, buffer];
}
return current[word1.length];

The results:

benchavg (ns)minmaxp75p99runs
getDistance - original999.44993.371025.92999.911025.9261
getDistance - pre-calculate length1005.09999.911054.091005.991054.0960
getDistance - 2 strings351.32347.92358.47352.31356.33153
getDistance - Uint arr flop410.17396.28426.10412.54425.94132
Not what I expected, instead of optimising the time, I increased it by ~50ns here.

So what’s happening here

After scratching my head for a while and evaluating what could possibly be the culprit. I suddenly realised a fact about the number type of javascript. Since Javascript numbers are 64-bit floating point numbers so there must be a significant time loss for converting them.

So I tested the theory out and instead of using Uint8Array I used Float64Array.

function getEditDistanceFloat64(word1, word2) {
const len1 = word1.length;
const len2 = word2.length;
let current = new Float64Array(len1);
let buffer = new Float64Array(len1);
for (let col = 0; col < len1; col++) current[col] = col;
for (let row = 1; row < len2; row++) {
buffer[0] = row;
for (let col = 1; col < len1; col++) {
if (word1[col - 1] == word2[row - 1]) {
buffer[col] = current[col - 1];
} else {
buffer[col] = Math.min(
buffer[col - 1] + 1,
current[col] + 1,
current[col - 1] + 1,
);
}
}
[buffer, current] = [current, buffer];
}
return current[word1.length];

Results:

benchavg (ns)minmaxp75p99runs
getDistance - original1018.711012.611053.971019.041053.9760
getDistance - pre-calculate length966.19956.521060.15965.811060.1562
getDistance - 2 strings352.84347.55361.66354.44359.13152
getDistance - Uint arr flop419.73387.63437.49422.73434.27130
getDistance - Float64 to rescue388.91371.31440.82389.17440.30139
Bingo! There was some truth to the theory. But things still don’t look good in order.

So back to researching again, why are arrays faster than typed arrays in js? Turns out, Javascript engines (in this case V8), tries to use arrays under the hood when possible. This is why the performance is almost identical for dynamic arrays and typed arrays.

There are certain optimisations built into the Array , which allow Javascript to optimise array for what kind of data is stored in it. You can learn more on it here.

Conclusion

Optimising javascript on a low level is tricky as we move more upwards in the abstraction levels without good controls, behaviours are often unpredictable.

Since my constraint here were limited regarding the data, I was able to try one more optimisation which was pre-allocating the typed array. Since the maximum length of my word is going to be 200, I can jus pre-allocate a typed array of length 200.

let current = new Float64Array(256);
let buffer = new Float64Array(256);
function getEditDistancePreAllocated(word1, word2) {
const len1 = word1.length + 1;
const len2 = word2.length + 1;
for (let col = 0; col < len1; ++col) current[col] = col;
for (let row = 1; row < len2; ++row) {
buffer[0] = row;
for (let col = 1; col < len1; col++) {
if (word1[col - 1] == word2[row - 1]) {
buffer[col] = current[col - 1];
} else {
buffer[col] = Math.min(
buffer[col - 1] + 1,
current[col] + 1,
current[col - 1] + 1,
);
}
}
[buffer, current] = [current, buffer];
}
return current[word1.length];
}

Music in my ears!

benchavg (ns)minmaxp75p99runs
getDistance - original1001.21981.391026.031001.901026.0360
getDistance - pre-calculate length976.29971.921049.28975.921049.2862
getDistance - 2 strings353.82340.93360.73355.25359.17152
getDistance - Uint arr flop426.41401.52599.55428.15451.05128
getDistance - Float64 to rescue416.75374.52444.09434.84443.40131
getDistance - Float64 pre allocated249.02248.09251.75249.21251.32211
I was able to optimise the algorithm further by ~70%. Phew!

I also tried pre-allocating Array but for some reason it did not optimise the result, I am out of guesses and would love to know if you have any idea why pre-allocated Float64Array worked but not Array.