I've been interested in the analysis and application of logging data for many many years, certainly many more than I've devoted to logworm per se.

One could say that my first exposure to event data, and to the difficulties of analyzing it, happened back in 1997, when I was a graduate student at the Experimental Knowledge Systems Lab. Our work focused on AI algorithms by which robots could build cognitive abstractions based on their interactions with their environment --just like babies are thought to do. We would let these robots roam around our office, moving randomly and recording information from their sensors. In some circumstances the robots would for example get stuck under a chair or a table, and they'd keep performing random actions until some of them (e.g. spinning their wheels forward or backwards) would get them unstuck. The goal was to enable them to build a cognitive abstractions (Lisp structures) through which they could understand that if they got stuck, they could try spinning their wheels in the opposite direction to get unstuck. The main difficulty, of course, was to analyze the almost real-time streams of data from several sensors (velocity, wheel rotation, direction, etc.) and correlate them into a coherent whole, taking into account the time lapses (ie, the effect of spinning your wheels backwards is only noticed a few steps later, when the velocity sensor starts reporting positive numbers again). I think it was too early in time for that project.
That project was my first exposure to event data, but at EKSL I was just an observer, a first-year grad student. In 2000, on the other hand, I was already working on real applications, and at that time I had the chance to help build a Palm Pilot + web application that allowed psychiatric patients to very quickly record their moods at regular intervals. The backend application later analyzed those entries and provided patients with reports and suggestions, while also allowing their doctors to track the effect of their medications more accurately. That was my first hands-on experience in logging and the processing of event data.
After that project, I spent the next 6 years working at the
Knowledge Discovery Laboratory at UMass, on what's called "relational data-mining", the search for patterns in very large databases made up of entities and the relations among them. In many of the cases the data that we were investigating were basically logs of events, such as in the case of the stock-exchange fraud work that we performed for the NASD, where we tried to infer fraudulent relationships between brokers by analyzing very detailed logs of transactions, reports of frauds, changes in employment, etc. In that time we built
Proximity, an analysis and research tool from which I got direct experience on the problems of very large amounts of data.

In 2008, already on my own, I spent a few months working as a contractor for Comcast, proposing a centralized repository for their logging information, along with tools to retrieve, parse, analyze, and automatically react to those logs. This was the first time I worked on logging data created by running computer applications, and through it I became very well acquainted with
Splunk, the all-powerful log analysis tool. It was at that time too that I first became interested in providing log storage and analysis solutions to smaller-scale developers and startups. With my partners at the time we applied to the YCombinator startup competition, proposing to build some sort of Splunk for the masses. Our application wasn't successful, but it was nevertheless a wonderful experience that encouraged us to start working on a real prototype.
Later that year, I was lucky to meet Matt Stevens at Comcast (now at Akamai), and we worked together on a very-large-scale telemetry application to record, analyze, and summarize logging information related to Comcast's largest web properties, with several billions of requests per month. Automatically gathering logs from Akamai, from several of their internal servers, and from external crawlers of the site, the backend would construct a picture of how well the entire system was working, identify the weak spots, alert when errors occurred (at that scale, a single typo in a link can cause tens of millions of costly 404s per day), and allow all stakeholders (developers, product teams, business analysts, SEO experts, ops teams, etc.) to focus on the part of the data that interested them and see how their choices affected other groups. It was an extraordinary product, I think.
Around that time, I also happened to read Jeff Atwood's
blog post about logging, and I was immediately drawn to one of his conclusions:
If it's worth saving to a logfile, it's worth showing in the user interface. This is the paradox: if the information you're logging is at all valuable, it deserves to be surfaced in the application itself, not buried in an anonymous logfile somewhere. Even if it's just for administrators. Logfiles are all too often where useful data goes to die, alone, unloved and ignored.
"Exactly", I thought! This is precisely what we need: not just the mechanisms that allow developers to record log events, but also the the tools that let them a) act on that information, and b) show that information in the UI, not just for end-users, but also for business folks, for the operations team, for the developers themselves, for the clients of the developers (if they are freelancers, for example), etc --just like the tool we built at Comcast did, turning data into knowledge for all stakeholders. All of these groups have different needs, and the tools should therefore be different from one another, but they should all share a common basic foundation: a solid, reliable, and secure repository of data, and a powerful mechanism to query that data.
And that's how logworm, in its current incarnation, was built. Thanks to the incredible advances in technologies in just one year, we realized that we could now a) use hosted MongoDB instances and therefore outsource the problem of scaling for large amounts of data, b) host our service in the cloud using Heroku, therefore outsourcing the problem of scaling our system as demand grew, and c) use Heroku itself and its add-ons platform as a readily-available marketplace to test the viability of our ideas. And this is where we are now.
Through all these years, I never considered that I was working on the acquisition, analysis, and automatic response to "logging data" --but all the experience is there, and hopefully logworm will show it.
- Agustin