CockroachDB internship project: Speeding up some interleaved table deletes by a factor of 10 billion
Last summer, I did an internship with Cockroach Labs, makers of CockroachDB, a SQL database built for massive distribution. I was working on the SQL language semantics in Cockroach, and I was able to work on many different facets of the project in that area.
Overall, my theme for the summer was finding ways to improve the performance of mutation statements - that's your
DELETEs. At the tail end of the internship, I was able to contribute a major performance gain by adding a fast path to a particular kind of
DELETE, involving a kind of table called an interleaved table. This post is about this particular performance fix and everything about how it works.
All the work described in this post actually comes from this pull request.
Trying to organize my Twitter timeline, using unsupervised learning
I'm a frequent user of Twitter, but I realize that among the major social networks it could be the hardest to get into. One of the big obstacles for me was that, as I followed more and more people representing my different interests, my timeline became an overcrowded mess with too many different types of content. For example, at one point I started following many tech-related accounts and comic book art-related accounts at the same time, and when I would go on Twitter I could never reasonably choose to consume content from only one of the groups.
Even after learning to adapt to this, I still thought that it would be nice to be able to detect distinct groups among the twitter accounts that I followed. The impetus to finally start a project about this came when I started using cluster analysis algorithms in my machine learning class - the algorithms used seemed to be exactly the right idea for this kind of community detection. With that I set off on the task to collect and analyze the data from my own Twitter follow list, with clusters!
The work I've done since then is still in progress (mostly because the results I'm getting aren't that great yet), and as I make more progress I'll be making more posts about it!
All the code is available on Github.
More details below!
Destroying Cockroaches and the Hackathon Experience
On the weekend of March 18th-19th I and a few of my fellow UBC students Gareth Ellis, Alexander Hoar, and Jeffrey Doyle worked together (and lost a ton of sleep) for the hackathon nwHacks 2017. One of the new and more prominent sponsors of the event was Cockroach Labs, creators of the distributed database CockroachDB, and we thought it'd be fun to work on a project involving CockroachDB and shoot for the "Best Use of CockroachDB" sub-contest they were running (and giving out a nice cash prize!).
CockroachDB is a SQL database that sets itself apart from other relational database systems by being distributed and really fault-tolerant. Leveraging the Raft algorithm for assuring consensus across nodes in a cluster, it's able to create a CP (consistent and partition-tolerant) system while at the same time being Highly Available (source). When I installed it for the first time in the days leading up to the hackathon I was surprised at how easy it was to set up a cluster (just set up an instance and let the other instances
join with the first one!) and use the admin interface present on each node. We decided to do a project that would exemplify CockroachDB's strengths by stress-testing a cluster, attempting to disrupt the consistency of the system.
kill -9 command on the particular instance that the cockroach was associated with. The server spawns new cockroaches (and new instances) depending on the ratio between how many cockroaches were killed and how many cockroaches have been spawned at the beginning of the game session.
Our project was quite challenging to implement and is built upon quite a few hacks, which was expected given that we are completely misusing CockroachDB on purpose. I'm really proud of the four of us for being able to finish each of our parts of the project, and integrate all of them together, within 24 hours.
Technical details below!
This post reflects the technology used in an earlier version of this website.
Hey, it's the first post!
Over the reading break, I had the opportunity to patch together a blog for this website. I hadn't done straight-up full-stack web dev in a while, so it was nice to just bang out a full application with nice functionality and a decent interface. I've wanted to make my own full website for myself since I was about 7 years old, so I guess that's also some sort of an achievement!
I was inspired to make this blog separate from any of my social media and other external accounts because I realized that you can only depend so much on external services; the life of this website won't be tied to the life of any of those. Hossein Derakhshan's post about the state of internet content solidified these sentiments for me; I hope that someday soon, more people will start their own blogs again.
More technical details about how I built this thing are available below; if you're less interested in that, then welcome!