100 Warm Tunas has been happily chugging away for the last month or so. I’ve obtained a fair amount of media coverage too.
A couple of days back, we posted the site to the triplej subreddit. Someone replied to the post telling me my vote count was significantly less than what they had been counting by hand, which made me somewhat suspicious – was there a bug in my Instagram scraping library that we built?
Well, after a bit of debugging early this morning, we found that there was indeed a bug. Not a bug with our scraping library, but rather a bug with how we were using the library:
{% highlight patch %}
- for page in ig.fetch_pages(’triplej’, per_page=10):
- for page in ig.fetch_pages(hashtag, per_page=10): for post in page.posts: if post.is_video: logger.info(“Skipping {} because it’s a video”.format(post.shortcode)) {% endhighlight %}
For those who are programmers, you’ll probably spot the issue here. For those who aren’t, the issue is that we have been using a hardcoded string to collect Instagram votes, when we thought we were collecting a handful of hashtags.
This has now been rectified and we have kicked off a full re-scrape to back-fill the data.