Adding a new field to a Firestore collection

In my app I tried to work exclusively with dates stored in UTC, though found some combination of the Firestore SDK, the browser, and my (lack of) JavaScript skills - round tripping dates was really hard. Saving a date at what I thought was midnight then no longer came back in a query that I, again, thought was from midnight. Date objects are passed on save/query, Firestore actually saves them as a Timestamp, and clear documentation on what should be a pretty common use case for dates - querying on a date range, leads to plenty of questions. I decided to reclaim what was left of my sanity and store unix time instead. I still need to handle local to UTC on the client, though Firestore is no longer doing anything more than storing a number.

After updating the app to handle a new field (dateUnix) going forwards, a backfill of old data was required. The example code below takes every item and stores the converted date value in a new field dateUnix. I was surprised to discover there was no way to query by absence of a field (e.g. .where('date','===', undefined)), and so instead you'll need to loop through every item and update if necessary. The code below is designed to run in the browser and assumes you have your Firebase configuration in a file called firebase.js.

import "firebase/firestore";
import firebase from "./firebase";

const db = firebase.firestore();
const itemsCollection = db.collection("items");

export const bulkUpdate = async () => {
const limit = 50;
let allItemsResult = await itemsCollection.limit(limit).get();
let read = allItemsResult.docs.length;

while (read > 0) {
const batch = db.batch();
let updated = 0;

allItemsResult.docs.forEach((queryResult) => {
const doc = queryResult.data();

if (!doc.dateUnix) {
updated++;

batch.update(queryResult.ref, {
// getTime() returns milliseconds
// We convert to seconds and remove any fractional part
dateUnix: (doc.date.toDate().getTime() / 1000) | 0,
});
}
});

await batch.commit();
console.log(`Updated ${updated} of ${read} items!`);

const lastVisible = allItemsResult.docs[read - 1];
allItemsResult = await itemsCollection
.startAfter(lastVisible)
.limit(limit)
.get();
read = allItemsResult.docs.length;
}
};

Some things to note about the script:

  • We get 50 items at a time
    • We could get everything if the collection is smaller, though then the example wouldn't feature pagination
  • We work in batches of 50 until there are no items left
    • Working a record at a time does work, though batching speeds things up
  • If the document doesn't have a dateUnix property, we add it
  • We create the new field by using existing data on the document
  • We use a transaction to
  • When we get the next set of 50 items, we use the last item we saw to control where to start the next page at
  • Progress is reported to the console...definitely an MVP implementation!

So far I've only had to do this on collections with a few hundred documents and it finished in less than a second. I also had the luxury of knowing the collection wasn't sustaining any concurrent modifications, so I have no idea how this performs on a busy collection.

One thing I've started to do is add an insertedDate value (using the serverTimestamp) to all documents, which I could have used to order the collection, and only process rows that weren't inserting dateUnix (as I updated the app to load that for all new rows). The other advantage of an insertedDate is that it allows you to keep track of progress or even partition the work (if you've got a very large collection you need to update).