## MSc Course Projects 1: Direction of Voice Filter

I’ve just finished my first year of my Master’s program and I’ve so far done a couple of course projects in a variety of areas. This is the Direction-of-Voice filter, which I’ve done with my colleague Abi Kuganesan for Dr. Robert Xiao’s “Machine Learning for Signal Processing” course. It’s an audio filtration application that filters unintended voice audio out of an audio signal by using machine learning to eliminate the portions of the audio generated by speakers not facing the microphone. Please also check out our presentation and demo that I’ve uploaded to YouTube!

The full report is also available on GitHub as well.

There’s not much to update this week, I’ve mostly just been doing lifestyle adjustments in preparation for my remote internship, which starts for me in a week. Comments are open for the first time on this blog. This was definitely a daunting decision to make, but I found it worthwhile. My social media addiction post last week got a really good response; I have two comments on the post and several other people reaching out to me on private channels, and some good conversations started.

## Trying to understand social media addiction

I’ve spent a little bit of time combing through research papers about social media, mostly by using Google Scholar. I didn’t have a particular goal in mind when setting out on this exploration, but I was primarily interested in the links between social media usage and mental health, as well as the analysis of social media addiction (how prevalent it is, its forms, patterns that differ among cultures and are common along cultural lines, etc).

## Logs 2020-05-11: Sequences of Sets

Here’s a quick little math one. In a probability theory class that I was sitting in on, one of the core concepts taught was the limit infimum and limit supremum of a sequence of sets. If $$A_n$$ is a sequence of sets (that are all subsets of some larger set $$E$$) then the two constructs are also known as $$A_n$$ almost always and $$A_n$$ infinitely often respectively.

## Quarantine Logs 2020-04-13: Circuit-Sim Progress!

Been a week! This is by far the most frequently I’ve ever posted. I’m hoping to keep it up.

I’m happy to update that I made a little progress on the circuit simulator I’ve been working on. Here I’ll get into some of what that’s all about in a little bit more detail. All the relevant code is up on Github, though it’s really bare-bones and without documentation as of the time of writing.

Learning about circuit analysis introduced me to the concept of nodes. A node is a point in a circuit where two or more components meet. This concept is important because of Kirchhoff’s Current Law, which states that the sum of currents leaving a node is 0.

$$\text{Current leaving node i} = \sum_{j \in N(i)} I_{i,j} = 0$$

Here’s an example:

## Moving to Hugo and Netlify

I just moved this website over from my makeshift homemade setup on my self-hosted Digital Ocean box to a more convenient stack. See the very first post here to see what the old stack looked like. I’m using Hugo with the whiteplain theme, keeping some of the simplicity of the old design. I’m currently hosting on Netlify It takes a nontrivial amount of work to migrate a website across setups like this.

## CockroachDB internship project: Speeding up some interleaved table deletes by a factor of 10 billion

Last summer, I did an internship with Cockroach Labs, makers of CockroachDB, a SQL database built for massive distribution. I was working on the SQL language semantics in Cockroach, and I was able to work on many different facets of the project in that area.

Overall, my theme for the summer was finding ways to improve the performance of mutation statements - that's your INSERTs, UPDATEs, and DELETEs. At the tail end of the internship, I was able to contribute a major performance gain by adding a fast path to a particular kind of DELETE, involving a kind of table called an interleaved table. This post is about this particular performance fix and everything about how it works.

All the work described in this post actually comes from this pull request.

## Trying to organize my Twitter timeline, using unsupervised learning

I'm a frequent user of Twitter, but I realize that among the major social networks it could be the hardest to get into. One of the big obstacles for me was that, as I followed more and more people representing my different interests, my timeline became an overcrowded mess with too many different types of content. For example, at one point I started following many tech-related accounts and comic book art-related accounts at the same time, and when I would go on Twitter I could never reasonably choose to consume content from only one of the groups.

Even after learning to adapt to this, I still thought that it would be nice to be able to detect distinct groups among the twitter accounts that I followed. The impetus to finally start a project about this came when I started using cluster analysis algorithms in my machine learning class - the algorithms used seemed to be exactly the right idea for this kind of community detection. With that I set off on the task to collect and analyze the data from my own Twitter follow list, with clusters!

The work I've done since then is still in progress (mostly because the results I'm getting aren't that great yet), and as I make more progress I'll be making more posts about it!

All the code is available on Github.

More details below!