Writes BytebyteGo newsletter -
2024-08-13 - Counting billions of content usage at Canva
- https://blog.bytebytego.com/p/counting-billions-of-content-usage
- Original design - Used a MySQL database. Had separate worker services for data collection, deduplication, and aggregation. This faced scalability issues.
- I was wondering why they needed deduplication and why that had become a big deal. Deduplication hints that the same event was being logged many times. Probably they have some rule to count multiple views as a single view. Say if I’m using an image that I bought. I might be changing things on my design and the page might reload a few times. You can’t count all the reloads as different usages, probably.
- Migrated to DynamoDB since it can handle things at scale. Didn’t complete this.
- Finally found success by migrating to an OLAP system using Snowflake.
- They say they reduced the latency from over a day to under an hour. Wow. Seems like a Hadoop kind of a system.