Converting an OpenFoodFacts CSV Dump into a SQLite Database with Deno and Streams

A few evenings ago, I came up with the idea to develop an app to capture nutritional information about food. There are some paid APIs, like FatSecret, Spoonacular, and Edamam, that offer a wide range of features. However, for a quick prototype, I didn’t want to spend too much money, so I looked for alternatives and came across OpenFoodFacts.

They offer an API (incomplete and poorly documented) as well as database dumps in various formats like JSON, CSV, and MongoDB. The smallest dump was the compressed CSV file, around 1 GB in size. Once you decompress the GZIP archive, the data takes up over 9 GB on disk. This size is still manageable – unlike the 40+ GB MongoDB dump.

The challenge now was how to process such large data efficiently and with minimal resources. Loading everything into RAM isn’t a good idea. Why? First, you’d need to load the entire 9 GB file, then create a data structure in memory to convert the data. This would likely double the memory requirement.

Deno can use around 2 GB of RAM by default. While you can increase this limit using flags deno run --v8-flags=--max-old-space-size=SIZE_IN_MB [CMD], even 32 GB would be tight for our purpose.

Deno File Streams to the Rescue

This is where data streams come in. The advantage of streams is that you don’t have to load the entire file into memory; only small chunks are loaded at a time. The processing happens incrementally, like a line reader that only reads one line at a time for processing. This keeps memory usage low and makes conversion efficiently manageable. Long story short:

// Setup our data stream...
const fileHandle = await Deno.open(inputFile);
const stream = fileHandle.readable
  .pipeThrough(new DecompressionStream("gzip")) // unzip on the fly
  .pipeThrough(new TextDecoderStream()) // text data
  .pipeThrough(new TextLineStream()); // use line breaks to define our data chunks

const reader = stream.getReader();
const db = initDB();
let data: ReadableStreamReadResult<string>;

const headers = (await reader.read()).value!.split('\t'); // quick and dirty

do {
  data = await reader.read();
  // data.value contains a text line from the CSV file
  // do some checks to ensure data is valid
  const values = data.value!.split('\t');
  process(headers, values);
} while (!data.done);

// close db connection, otherwise it could leak memory
db.close();

Setup the SQLite database, no rocket science here:

function initDB() {
  // import { Database } from "jsr:@db/sqlite@0.11";
  const db = new Database("test.db");
  db.exec(`
    CREATE TABLE IF NOT EXISTS food_items (
      code TEXT PRIMARY KEY,
      product_name TEXT NOT NULL,
      ...
  `);
  return db;
}

Let’s get our hands on the raw CSV file and convert it into a usable data structure. The code here is just a rough version to get you up and running. I like to get my hands on early to see some results. You can make it pretty, later.

// headers are just a string array like ["code", "product_name", ...]
// values represents one data set from the csv file like
//   ["0000001", "Apple Pie", ...]
//      ^ code     ^ product_name
// so we know that values[0] is always the "code" here and 
// values[1] is the product name

function process(headers: string[], values: string[]) {
  // for demo purpose
  const indexCode = headers.indexOf("code");
  const indexProductName = headers.indexOf("product_name");

  // perform some validity checks, e.g. product name and code is set
  
  // our data object we want to store
  const foodItem: any = {};

  for (let h = 0; h < headers.length; h++) {
    const header = headers[h];

    // implement filter for headers

    foodItem[header] = values[h].trim();
  }

  persist(foodItem);
}

Now let’s store our food data object into our database.

function persist(foodItem: any) {
  // perform some sanity checks
  // here we just check if there is actual any value we can import
  const insertKeys = Object.keys(foodItem)
    .filter((k) => foodItem[k] != "" && foodItem[k] != null);

  // prepare our sql query for inserting values  
  const sql = `INSERT INTO food_items (${ insertKeys.join(",") }) 
    VALUES (${ insertKeys.map((_v) => "?").join(",") });`;

  // provide the needed values
  const insertValues = insertKeys.map((k) => foodItem[k]);

  try {
    db.exec(sql, insertValues);
  } catch (err) {
    // implement your error handling
    console.error(err);
  }
}

That’s basically it. You can use this as a quick starting template to get something up and running.