Welcome guest

Super fast log processing ..... with a few buts unfortunately

Posted by ben 29 Jul 2010 | 0 comments

My awesome new log processing software went live the other day and it's caused a little bit of disruption. A lot of things went right but some things went wrong:

What went right:

A bunch of stuff went right and overall this new version of the software is a stunning success:

  • Logs are now processed about 30 - 40 times faster than before. The old software was getting bogged down and spending in worst cases almost an hour on a batch of files during which meant of course there were more files waiting for it when it finally finished. The new software can process a batch of files in as few as 2 or 3 minutes.

  • Because the new software's so fast I was able to reduce the log file size by 40% so they're completed and sent off for processing a lot sooner.

  • The negative bounce rate problem was fixed.

  • A not so obvious problem with timezones was fixed that properly aligns your stats with the timezone you select in your profile.

What went wrong

A bunch of stuff went wrong which has been a major pain in the ass to rectify:

  • To fix the negative bounce rates I decided to reprocess July's log files, and it took a few goes to get the software really working properly in reprocessing mode - it doubled some days and reduced some others.

  • A bug meant it assumed your metricid (regardless of type) was the last metricid it had encountered so people got some weird stuff in their reports.

  • The new software operates almost entirely in memory and I was using dates with different times as the key to the in-memory versions resulting in a massive 8 gigabyte swapfile from hundreds of versions of the day's data, which also messed up some stats.

  • Metric names weren't being unescaped resulting in duplicate metrics for 'some thing' and 'some%20thing'.

What's happening now

July is reprocessing semi-properly now, the unescaped metric names remain an issue that'll be fixed when it completes.

July 25 hasn't been processed but will hopefully be done soon. Because the timezones are correctly applied now reprocessing 'a day' means examining the log files (over 20gb/day) of the day before and after. Because there's so much data it has to be compressed which makes it slower to read.

Until it finishes this monster reprocessing job the reports are going to look messed up and some bugs are going to make some pages unavailable in the members system.

Everything should be back on track hopefully within the next 24 hours but it might be the weekend before everything's fully fixed up.

Meanwhile

The donation drive is going great! Andy Moore made a massive donation that helps a lot, so did Hero Interactive, the guys behind Tumbleball, Thomas from PsyFlash Productions and Hybrid Mind Studios. My goal is to raise a couple to a few grand so I can get the API sorted out in a few extra languages - iOS, Android, etc - which means I need some hardware.

Massive progress is being made on the next version of the website, it has forums, new blog software, support tickets, and a really cool new feature - charts that highlight the best games with ties to Flash Game License (or your site/email) to hopefully push nonexclusive sales and generally help with distribution.

Comments

blog comments powered by Disqus