HackerNews Cheaters: Catch me if you can

HackerNews is the most popular news site in the tech industry and one of the top choices for new product releases. A made-up word “HackerNews-effect” is often used to describe the traffic spikes brought by popular HN stories. The overcrowding of threads on HN has made its front page the holy grail for new product release, as most people would only refresh the front page to catch the hottest topics. The competition for getting into the limited spots on the front page has been nothing less than fierce. Just like any other man-made systems, cheating is inevitable as the cost of cheating is low and outcome is huge.

We started to collect HN stories since Dec. 2014 for our daily news digest service. Using the HN API, we got notifications whenever there was any change happened on HN. Instead of just keeping the latest status of a story, we stored all the revisions we crawled. We had no idea about what could be done with the data at the beginning, but having a copy of the full history of HN sounded interesting and it might be useful for future analysis.

We submitted our first "Show HN" story about our service to HN in Jan. 2015. We didn’t pick any specific time to submit, we didn’t ask friends to help upvote, we just simply believed that it would get to the front page for sure, because it is such a good product. And of course, with no surprise, we didn’t make it. Then we started to look into those HN data we collected to understand how the system works to improve our strategy for future submissions. To our surprise, we found some front page stories have abnormal growth graph and the only reason we can explain the result is that people are cheating to get on frontpage.

To it’s simplest form, the score HN used to rank stories is based on two factors: upvotes which is positive and time since submission which is negative. The upvotes is like fuel to lift your position and the time since submission is like gravity to pull you down. To get on frontpage, you have to get as many upvotes as possible shortly after submission. And to keep your place, you have to get upvotes continuously.

All examples provided in this post are true stories, but we shall not name the underlying authors and threads. We also can’t guarantee our theory is 100% correct, these are just natural conclusions from our assumptions. Nothing has been confirmed by HN or Y Combinator.

Below is example showing the growth graph of a very popular story recently.

The green line is the upvotes which is growing rapidly at the beginning then slowing down gradually. The blue line is its position ranking. When it’s below the dotted red line, the story is on the front page. This chart looks perfectly normal since the green line is smooth and the blue line is relatively stable.

Let's look at some abnormal ones.

This one got 3 points in one hour when it’s out of top 300 but only 4 points in almost three hours on front page. With huge front page traffic, it’s hard to believe that a good story would only get 4 or 5 up votes in its 3 hour window on the front page, therefore, it makes the first 3 upvotes look very suspicious.

These two stories are submitted at almost the same time. The first one failed to reach front page even after more than 10 upvotes. And the second one got on front page after 2 upvotes. This indicates the first one is very likely to be a failed attempt to cheat which got detected by the system.

This one is very mysterious. The story suddenly got popular and reach front page more than 6 hours after submission with only few upvotes. Maybe there’s someone with extremely high karma upvoted the story. However, there is also a contradictory theory that upvoters’ karma will not affect the ranking of the story. We don’t know the explanation of this phenomenon, but one thing for sure, this happens frequently on HN.

Analyzing HN front page stories and catching people cheating is so much fun. We’re thinking about releasing this feature to the public. If you’re interested to be the first ones to test this feature and give us feedbacks, please click the button below to login with your github account and request an invitation.

We think this is a challenge for HN. Getting on the front page is too random and the existence of cheaters just makes it even harder. Some very interesting stories got buried due to lack of upvote after submission. We have put some efforts trying to solve this problem for the open source community. If you rely on HN to discover new and trending open source projects, please check out our explore page to find github projects submitted to HN everyday.

EDITED: We submitted this post to HN. There are many insightful comments there. Definitely worth reading.