Charlton's Blog

Esoteric Programming Lesson #2: “Bucket Sort”

Vol. 1, Issue #2 in the "Daniel Fucking Nerd-Sniped Me" Series.

Published: Nov 25, 2022
Category: Programming
Tags: , ,

Recently, I was approached by a member of Tampa Devs who was facing a small sorting puzzle at work.

They had been tasked with developing a user interface for an internal invoice management application. However, the schema for the API they had been provided to retrieve invoice data from was organized in a less than ideal way.

The invoice data API was designed such that information related to every invoice was returned in a single big array:

[
  {
    object related to invoice A
  }, 
  {
    object related to invoice B
  },
  {
    object related to invoice C
  },
  {
    another object related to invoice A
  },
  etc
]

This is inconvenient, because organizing all available information about a particular invoice requires one to traverse the entire array and gather relevant entries together.

In my friend’s case, these entries contain data that is continuously updated based on the current state of the user interface, so there was a legitimate need from both an ergonomics and performance standpoint to ensure these entries are properly sorted. But what would that look like, and how?

Suppose I wanted to find all elements related to a particular parent invoice ID of 193830958 in the following response:

[{
    "item_id": "EF135520",
    "item_type": "Request",
    "description": "TEST",
    "parent_id": "193830957",
}, {
    "item_id": "193830958-383",
    "item_type": "Invoice",
    "parent_id": "193830958",
}, {
    "item_id": "193830958-384",
    "item_type": "Invoice",
    "parent_id": "193830958",
}, {
    "item_id": "193830958-383",
    "item_type": "Invoice",
    "parent_id": "193830959",
}, {
    "item_id": "193830958-386",
    "item_type": "Invoice",
    "parent_id": "193830959",
}]

Wouldn’t it be nice if these entries were grouped together by parent ID, rather than floating around in a big one-dimensional array?

Here’s what we’re looking for:

{
  invoice id: [
    related object,
    another related object,
    ...
  ],
  another invoice id: [
    related object,
    another related object,
    ...
  ]
}

To achieve this, we’ll reach for the Bucket Sort algorithm. Bucket Sort works by grouping related objects together into “buckets” (or “bins”), much in the same way you might sort a bag of M&Ms into piles based on their color.

In our particular case, we want to sort our invoice data entries based on their parent_id properties, which groups them in terms of the parent invoice the data is related to.

Let’s re-organize our original array of invoice data into this new structure:

{
    "193830957": [{
        "item_id": "EF135520",
        "item_type": "Request",
        "description": "TEST",
        "parent_id": "193830957"
    }],
    "193830958": [{
        "item_id": "193830958-383",
        "item_type": "Invoice",
        "parent_id": "193830958"
    }, {
        "item_id": "193830958-384",
        "item_type": "Invoice",
        "parent_id": "193830958"
    }],
    "193830959": [{
        "item_id": "193830958-383",
        "item_type": "Invoice",
        "parent_id": "193830959"
    }, {
        "item_id": "193830958-386",
        "item_type": "Invoice",
        "parent_id": "193830959"
    }]
}

Implementing Bucket Sort

Let’s start with a simple outline of Bucket Sort’s steps. Here’s how it works:

For each item of the source array:

  • Examine the current item.
    • Does the current item belong in one of our buckets?
      • If so, place the current item in its corresponding bucket.
        • If a bucket to store the item doesn’t exist, create a new one to store it.

Now that we’ve outlined the sorting logic, let’s re-word it in terms of our invoice data.

For each entry of invoice data:

  • Examine the current invoice data.
    • Does a bucket exist with the parent_id of the invoice data?
      • If so, place the invoice data into the corresponding bucket based on its parent_id property.
        • If a bucket matching the current invoice’s parent_id doesn’t exist, create a new one and store the invoice there.

In JavaScript

Based on the above, we can write a compact sorting implementation in JavaScript.

function bucket_sort(res) {
    b = {};

    for (const e of res)
        e.parent_id in b ? b[e.parent_id].push(e) : b[e.parent_id] = [e];

    return b;
}

In JavaScript (With Comments)

Here’s a heavily commented version to explain what’s going on.

function bucket_sort(res) {
    // Instantiate an empty Object to hold our sorted results
    b = {};

    /* The general idea:
     * Loop through each element.
     *   Examine each element's po_num.
     *     If we've seen this po_num before, add it to the corresponding bucket.
     *     If we haven't seen this po_num before, create a new bucket and place the 
     *   element inside. 
     */

    // Iterate over each element in the array
    for (const e of res)
    /* note: ternary syntax -- (test condition) ? true case expression : false case expression;
     *
     * Our test condition is e.parent_id in b -> We're testing if a property exists 
     * on the b object with a name matching the parent_id of the current element from 
     * the array.
     *
     * Our true case expression is b[e.parent_id].push(e) -> If the b object has a 
     * property matching the current item's parent_id value, then we can push the current
     * element into its corresponding bucket in the b object.
     *
     * We can always call the Array push method in the positive case, since we know the 
     * type of any property's value in the b object will be an Array if it exists (we 
     * instantiate it on the fly otherwise).
     *
     * Our false case expression is b[e.parent_id] = [e] -> If no property exists on the 
     * b object matching the current element's parent_id, we create one and assign 
     * it the value of an array literal containing the current element.
     *
     * Note the syntax in use here, [e], is a syntactically compact method of creating 
     * a new array inside the b object that contains the current element. 
     */
        e.parent_id in b ? b[e.parent_id].push(e) : b[e.parent_id] = [e];

    return b;
}