# CockroachDB internship project: Speeding up some interleaved table deletes by a factor of 10 billion

Last summer, I did an internship with Cockroach Labs, makers of CockroachDB, a SQL database built for massive distribution. I was working on the SQL language semantics in Cockroach, and I was able to work on many different facets of the project in that area.

Overall, my theme for the summer was finding ways to improve the performance of mutation statements - that's your INSERTs, UPDATEs, and DELETEs. At the tail end of the internship, I was able to contribute a major performance gain by adding a fast path to a particular kind of DELETE, involving a kind of table called an interleaved table. This post is about this particular performance fix and everything about how it works.

All the work described in this post actually comes from this pull request.

# Trying to organize my Twitter timeline, using unsupervised learning

I'm a frequent user of Twitter, but I realize that among the major social networks it could be the hardest to get into. One of the big obstacles for me was that, as I followed more and more people representing my different interests, my timeline became an overcrowded mess with too many different types of content. For example, at one point I started following many tech-related accounts and comic book art-related accounts at the same time, and when I would go on Twitter I could never reasonably choose to consume content from only one of the groups.

Even after learning to adapt to this, I still thought that it would be nice to be able to detect distinct groups among the twitter accounts that I followed. The impetus to finally start a project about this came when I started using cluster analysis algorithms in my machine learning class - the algorithms used seemed to be exactly the right idea for this kind of community detection. With that I set off on the task to collect and analyze the data from my own Twitter follow list, with clusters!

The work I've done since then is still in progress (mostly because the results I'm getting aren't that great yet), and as I make more progress I'll be making more posts about it!

All the code is available on Github.

More details below!

I've just implemented RSS on my blog. You can find the feed at /recent.atom.

Implementing this was relatively simple: there was a library for it! The Flask website had a really easy tutorial that was really easy to adapt to my own database models.

# Destroying Cockroaches and the Hackathon Experience

On the weekend of March 18th-19th I and a few of my fellow UBC students Gareth Ellis, Alexander Hoar, and Jeffrey Doyle worked together (and lost a ton of sleep) for the hackathon nwHacks 2017. One of the new and more prominent sponsors of the event was Cockroach Labs, creators of the distributed database CockroachDB, and we thought it'd be fun to work on a project involving CockroachDB and shoot for the "Best Use of CockroachDB" sub-contest they were running (and giving out a nice cash prize!).

CockroachDB is a SQL database that sets itself apart from other relational database systems by being distributed and really fault-tolerant. Leveraging the Raft algorithm for assuring consensus across nodes in a cluster, it's able to create a CP (consistent and partition-tolerant) system while at the same time being Highly Available (source). When I installed it for the first time in the days leading up to the hackathon I was surprised at how easy it was to set up a cluster (just set up an instance and let the other instances join with the first one!) and use the admin interface present on each node. We decided to do a project that would exemplify CockroachDB's strengths by stress-testing a cluster, attempting to disrupt the consistency of the system.

Our project is DESTROY ALL ROACHES!! (GitHub repo) a Javascript-based game where you kill cockroaches on screen by remaining in close proximity to them. Each cockroach actually corresponds to an actual instance of CockroachDB running in the server's backend (where all of the cockroaches are part of the same cluster). Every time the player kills a cockroach, the web server executes a kill -9 command on the particular instance that the cockroach was associated with. The server spawns new cockroaches (and new instances) depending on the ratio between how many cockroaches were killed and how many cockroaches have been spawned at the beginning of the game session.

Our project was quite challenging to implement and is built upon quite a few hacks, which was expected given that we are completely misusing CockroachDB on purpose. I'm really proud of the four of us for being able to finish each of our parts of the project, and integrate all of them together, within 24 hours.

Technical details below!