My interest in the acquisition, analysis, and automatic response to "logging data"

I've been interested in the analysis and application of logging data for many many years, certainly many more than I've devoted to logworm per se. 

One could say that my first exposure to event data, and to the difficulties of analyzing it, happened back in 1997, when I was a graduate student at the Experimental Knowledge Systems Lab. Our work focused on AI algorithms by which robots could build cognitive abstractions based on their interactions with their environment --just like babies are thought to do. We would let these robots roam around our office, moving randomly and recording information from their sensors. In some circumstances the robots would for example get stuck under a chair or a table, and they'd keep performing random actions until some of them (e.g. spinning their wheels forward or backwards) would get them unstuck. The goal was to enable them to build a cognitive abstractions (Lisp structures) through which they could understand that if they got stuck, they could try spinning their wheels in the opposite direction to get unstuck. The main difficulty, of course, was to analyze the almost real-time streams of data from several sensors (velocity, wheel rotation, direction, etc.) and correlate them into a coherent whole, taking into account the time lapses (ie, the effect of spinning your wheels backwards is only noticed a few steps later, when the velocity sensor starts reporting positive numbers again). I think it was too early in time for that project.
 
That project was my first exposure to event data, but at EKSL I was just an observer, a first-year grad student. In 2000, on the other hand, I was already working on real applications, and at that time I had the chance to help build a Palm Pilot + web application that allowed psychiatric patients to very quickly record their moods at regular intervals. The backend application later analyzed those entries and provided patients with reports and suggestions, while also allowing their doctors to track the effect of their medications more accurately. That was my first hands-on experience in logging and the processing of event data.
 
After that project, I spent the next 6 years working at the Knowledge Discovery Laboratory at UMass, on what's called "relational data-mining", the search for patterns in very large databases made up of entities and the relations among them. In many of the cases the data that we were investigating were basically logs of events, such as in the case of the stock-exchange fraud work that we performed for the NASD, where we tried to infer fraudulent relationships between brokers by analyzing very detailed logs of transactions, reports of frauds, changes in employment, etc. In that time we built Proximity, an analysis and research tool from which I got direct experience on the problems of very large amounts of data.
 
proximity
 
In 2008, already on my own, I spent a few months working as a contractor for Comcast, proposing a centralized repository for their logging information, along with tools to retrieve, parse, analyze, and automatically react to those logs. This was the first time I worked on logging data created by running computer applications, and through it I became very well acquainted with Splunk, the all-powerful log analysis tool. It was at that time too that I first became interested in providing log storage and analysis solutions to smaller-scale developers and startups. With my partners at the time we applied to the YCombinator startup competition, proposing to build some sort of Splunk for the masses. Our application wasn't successful, but it was nevertheless a wonderful experience that encouraged us to start working on a real prototype. 
 
Later that year, I was lucky to meet Matt Stevens at Comcast (now at Akamai), and we worked together on a very-large-scale telemetry application to record, analyze, and summarize logging information related to Comcast's largest web properties, with several billions of requests per month. Automatically gathering logs from Akamai, from several of their internal servers, and from external crawlers of the site, the backend would construct a picture of how well the entire system was working, identify the weak spots, alert when errors occurred (at that scale, a single typo in a link can cause tens of millions of costly 404s per day), and allow all stakeholders (developers, product teams, business analysts, SEO experts, ops teams, etc.) to focus on the part of the data that interested them and see how their choices affected other groups. It was an extraordinary product, I think.
 
Around that time, I also happened to read Jeff Atwood's blog post about logging, and I was immediately drawn to one of his conclusions:
 
If it's worth saving to a logfile, it's worth showing in the user interface. This is the paradox: if the information you're logging is at all valuable, it deserves to be surfaced in the application itself, not buried in an anonymous logfile somewhere. Even if it's just for administrators. Logfiles are all too often where useful data goes to die, alone, unloved and ignored. 
 
"Exactly", I thought! This is precisely what we need: not just the mechanisms that allow developers to record log events, but also the the tools that let them a) act on that information, and b) show that information in the UI, not just for end-users, but also for business folks, for the operations team, for the developers themselves, for the clients of the developers (if they are freelancers, for example), etc --just like the tool we built at Comcast did, turning data into knowledge for all stakeholders. All of these groups have different needs, and the tools should therefore be different from one another, but they should all share a common basic foundation: a solid, reliable, and secure repository of data, and a powerful mechanism to query that data.
 
And that's how logworm, in its current incarnation, was built. Thanks to the incredible advances in technologies in just one year, we realized that we could now a) use hosted MongoDB instances and therefore outsource the problem of scaling for large amounts of data, b) host our service in the cloud using Heroku, therefore outsourcing the problem of scaling our system as demand grew, and c) use Heroku itself and its add-ons platform as a readily-available marketplace to test the viability of our ideas. And this is where we are now. 
 
Through all these years, I never considered that I was working on the acquisition, analysis, and automatic response to "logging data" --but all the experience is there, and hopefully logworm will show it.
 
- Agustin
 
 
Filed under  //   experience   logging   motivation   story  

Comments [0]

Paid MongoHQ plans --finally!

I just signed up to the paid MongoHQ plans. I've been trying to pay them for months now, but their billing system wasn't working. They've finally changed it today, and now I have an official account with them --out of beta, getting ready for the public release of logworm. 

What I like about the plan they've come up with is that you really pay for what you use. You can create different databases of different sizes and characteristics, and pay for each one of them separately. 

Too bad I won't be able to see them in MongoSF next week.

Comments [0]

Still alive!

Haven't posted any updates to the blog in almost a month, but that doesn't mean that there hasn't been any activity on logworm. Quite to the contrary, a lot has happened in these weeks.

Click here to download:
PastedGraphic-1.pdf (100 KB)
(download)

Our JIRA shows all the progress that we've made, especially towards completing the support of logworm as a Heroku add-on:

  • Cleaned up and vastly improved the gems
  • Changed all configuration parameters for the DB so that it works easily as a Heroku add-on
  • Implemented Heroku SSO* Fixed the visual layout
  • Added a decent amount of testing code (using RSpec)
  • Wrote a lot of documentation pages
  • For internal development purposes, finally created a much overdue staging and testing platform

At this point we're waiting on the Heroku guys to start offering logworm as to their alpha users. That testing should go on for about a week, after which we'd be exposed to 500 or so beta users. I think we're ready.

Comments [0]

A business is an artifact

A business is an artifact. Something one creates. Not unlike a program or a system. 

It took me a long time to realize this. For years I cared about the product I was working on at the time and hoped that, in spite of my lack of business thinking, it would somehow turn out to be successful and allow me to run a business around it. It doesn't work that way. When you create a business, the business itself is your creation --that is what you think about, what you care about, what you want to see grow. That is what you develop, test, tweak, and document. The particular product you develop is certainly not the end in itself, --it's just a means to an end. 

That's a hard notion for a computer scientist to accept and embrace. 
Filed under  //   Business   growth  

Comments [0]

On the costs of being an entrepreneur in the US and in Europe

I don't know how I came across an excellent post by Martin Varsavsky, about the burdens of starting a company in the US versus Europe. His argument is that, for all the goodness that one can find in the Valley (great talent, a culture of innovation, a hub of collaboration and new ideas), the US actually makes it much harder for startups than Europe does. He cites three reasons: the crazy legal costs (lawyer fees and the litigious culture), the lack of public health insurance (which adds approximately $10K per employee in costs which in Europe don't exist), and the "defense tax" (the fact that such a large percentage of the tax revenue goes to the military that other services like education and public transportation cannot be offered as they are in Europe, and therefore employers must directly or indirectly pay for them for their employees). 

I think that the situation is in fact much worse for VERY small companies (one or two people) than he describes: the cost of health care and education here is so prohibitive that embarking on a new small venture is a tremendously risky proposition, with real chances of ending up in bankruptcy and lack of access to proper health care. In order words, you are pretty much risking your life and your family’s. Countries with public health care and education are not only providing better standards of living for their people; they are in fact also encouraging entrepreneurship and innovation (by mitigating the inherent risks of the small entrepreneur) in a way that the US, who prides itself in its entrepreneurial spirit, cannot even dream of offering.

In this context, it is no surprise that such a large percentage of entrepreneurs in Silicon Valley are in their 20s; they are precisely those for whom, because of their age, living without health insurance is a reasonable calculated risk (and they normally don’t have children and so they are not risking their kids’ futures either). Regretfully, this also prevents older and more mature people from starting new small businesses, and I think that in particular in the tech world, a lot of potential gets lost right there: we overvalue the energies of young entrepreneurs, and are giving up the potential that comes from the maturity and the experience of tech people with more years on the field. 

The flip side of the coin is that I think that right there lies a very good opportunity for VCs –funding more mature and experienced entrepreneurs, who might require bigger initial investments to cover their living expenses, but who in return can skip a lot of the learning process that young ones have to go through.

Comments [0]

Keeping the morale up -- one step at a time

Everyone will tell you that the psychological challenges of creating a new product (and taking it seriously as a business) are equally if not more important that the technical and business challenges, especially if you're going at it alone. These past three weeks have felt like a roller-coaster ride, and it takes a lot of wisdom (which I not always have) to not miss the forest for the trees --both when things are looking good and when they are looking bad. 

On the days when I get feedback from beta testers or I manage to focus on small and achievable tasks, I come back home with a great sense of accomplishment, feeling like the king of the world. It's exhilarating to feel that I'm working on something that matters, to hear that people care about what I'm doing, and to see the product or the business get better every day. But then a few days go by in complete isolation, with my head down programming, programming, programming, and I start to doubt the wisdom of these efforts: why am I building this? will anybody care? isn't this just a toy I'm working on? will I be able to keep going long enough to actually give this project a real chance? and why is it going so slowly?

On the bad days I find that the best thing that I can do is step away from the work for a while and get back to the more "human" things that fill me with joy: family, reading literature, running or playing soccer. At the end of the day these activities give me a sense of accomplishment of another kind, and I feel my energies renewed. It's enough to keep the belief alive for another couple of days, and hopefully in that time I'll hear again from potential users, or from people working on similar things or going through similar roller-coasters, to get the positive cycle going again, one more time. Lather, rinse, repeat.

The trick is then to take it one step at a time, like my wife does. After being an architect her entire professional life, she recently decided to change paths and put all her energies on being an artistic photographer, which is where her heart is at the moment. Her challenges are similar to mine in many ways, in particular in how she's at the point where she knows that she has the vision and the talent, but she still needs to build a reputation as an artist instead of "a lady who takes nice pictures". Her approach is to participate in competitions and exhibits, patiently submitting her work for evaluation over and over again (the equivalent of beta testers in her field), trying to learn from the feedback, and slowly increasing her exposure and the level at which she exhibits. Which is why having her submission accepted at the prestigious International Photography Competition organized by the Fraser Gallery in Bethesda, MD felt like such an important accomplishment... to be quickly surpassed by being awarded, unexpectedly, the First Honorable Mention at the exhibit's opening last night, for her picture "Rooflines"! 


Rooflines.  Performing Arts Center at Bard College designed by Frank Gehry.  Annandale-on-Hudson, NY.  2008  

If you focus on taking one small step at a time, as opposed to aiming for instant fame and glory, the challenges becoming manageable and, most importantly, enjoyable. 
Filed under  //   Business Tricks  

Comments [0]

logworm's first week -- An update

I started this week with lots of goals, mostly having to do with communicating more closely with my beta users, and looking for others interested in giving logworm a try. It ended up being a week of tons of work, but regretfully not most of it directed towards logworm but to my consulting job. It's disappointing in a way, but on the other hand a company needs a cash flow, and having this consulting job ultimately allows me to extend the time I have to dedicate to logworm --and that's a great thing!

What were the highlights of the week? 
  • I did manage to spend enough time with my beta users, making sure that they have all they need to test (and hopefully enjoy!) logworm. This has been so far the best way to spend my time: not only do I keep these initial "customers" happy, which is certainly necessary, but most importantly I get good feedback from them and that allows me to work on features that really matter --not just features or solutions that I guess they'll care about, but features that I now know for certain that they'll care about. Plus, having someone say "thank you" because you help them solve some of their problems is a big boost in terms of self-esteem, and a good confirmation that you're on to something --I find it gives me a lot of energies to keep going. Having users is the best thing that can happen to you. For next week I plan to slow down my rate of development and spend as much time as possible talking with these beta testers, gauging their needs, and solving their problems. 
  • I attended the NoSQL Live conference in Boston, about which I'll write in more detail later. It was a great experience, not necessarily because of the talks and presentations themselves, but mostly because of the interesting people I got to meet. Among them was Adam Wiggins, co-founder of Heroku, with whom I could discuss architectural details for logworm and other potential improvements to their platform, including their upcoming marketplace for add-ons.
  • Fortunately I had time to work on some needed logworm improvements. Most importantly, the gems for the client have been cleaned up, unified, tested, and uploaded to Rubygems.org; this means that the installation is now really a breeze: just sudo gem install logworm_client, or simply add the gem to the .gems file. Also, I have vastly improved the Documentation, with much cleaner instructions on how to install the gems and what to do once you have data --the first logworm tutorial is slowly taking shape. Finally, and from feedback from one of my beta testers, I realized that people are concerned about potential delays to their applications while the client is talking to the server to store the logging information. To mitigate the apprehension, I've added a note in the FAQ, and modified the code so that it now reports the amount of time it takes talking to the server. In my experience, a logging call adds an average of 40ms to the processing of a request. 

The plans for next week are similar: to continue this (at times slow) process of customer-driven development. This means not only responding to customers' requests, but also focusing as much as possible on acquiring new testers. Being able to putting myself in the shoes of the person who hears about logworm for the first time and has to decide whether to devote it more than 3 seconds of their attention, allows me to make better decisions when I package the gems, write the documentation, work on the FAQ, design the welcome page, etc. 

As they say, one step at a time... and I'm certainly looking forward to the next ones! 

- Agustin
Filed under  //   Business   Progress  

Comments [0]

Beginning anew, with a thank you note

My professional career has not followed a clear, linear path. I moved from Argentina to the US when I was 23 and landed in the miniscule town of Amherst, Massachusetts with the grand goal of getting a PhD in Computer Science under the direction of Paul Cohen, a brilliant and inspiring AI researcher. My ambitions changed focus rather quickly and, content with a Masters Degree and fortunate to have established strong relationships with clever computer scientists like Cris Pedregal-Martin, I left the university and worked as a contractor with a couple of interesting startups in San Francisco and with my good friend Farshad Nayeri in Boston. After the disasters of 2001 I decided that I, like the country, could use some calm and safety, and so I joined the Computer Science Department at UMass once again, this time as a Senior Architect for the Knowledge and Discovery Lab, where for six years I was lucky to work with the group's visionary director David Jensen and with Matt Cornell, who taught me how to program clearly, meticulously, and with agility. Then came 2008, a year already marked with changes as our first son joined us, and during which I felt it was time to leave UMass once again (maybe not forever?) and set out on my own. I started Pomelo, LLC, a small consulting company, and it took me just a few months to find another great mentor/advisor/inspirer, Matt Stevens, Director of Software Architecture for one of my clients, Comcast. The consulting adventure was a success, both financially and intellectually, as under Matt's direction I did the most creative, innovative, and important work I've done so far. Pomelo has given me two very very good years but, alas, I feel the itch coming again; it's time for another change...

My work with Matt Stevens in the past year has focused on building reliable and highly-available infrastructures for deploying critical web-scale solutions. Because our project was backed by big budgets, we could base our infrastructure on partners like Akamai and Mashery, cheap and convenient solutions for the enterprise but prohibitively expensive for small developers and startups. I have been pondering for a while about whether it'd be possible to bring the advantages of these economies of scale, where the cost per transaction decreases rapidly as the number of them grows, and make them available also to those who don't have the thousands and even hundreds of thousands of dollars necessary to just sit at the table with these big providers. I think it's a very interesting problem, so interesting to me that I have decided to put all my attention into it: I am starting a new (VERY small for now) software business whose mission is to identify abstract pieces of Internet architecture that power reliable (and easy-to-understand) large-scale systems, find clever ways to implement new ones or use existing ones at lower costs, and offer them as services so that small developers and startups can easily incorporate them into their new, creative applications. It's the idea of turning infrastructural components into utilities, as Nicholas Carr describes in his Big Switch; in this case we could call it "Good-Internet-Architecture-as-a-service", if you will.

It is an ambitious goal, but I think it's achievable if we tackle it one step at a time. And so I begin, accordingly, in a humble way, with what I believe to be a very important but very rare piece of Internet architecture: a reliable utility for logging. My first product (still in closed Beta!) is logworm, a hosted service that allows developers to log information from their applications as they run --not just the typical web access logs, but in general any kind of business-related information that they care about-- and then turn that information into knowledge (via detailed reports) or, more importantly, into actions (via programmable interfaces and APIs to access the indexed data).

In the coming months there will be plenty of time to talk about logworm, and about the ups and downs of starting a micro-ISV without much (if any!) experience. For now I simply want to go back to the first paragraph, looking back to the years past, and thank those who have helped me get to the point where I am ready for yet another adventure. With the Oscars just around the corner, I don't want this ending to sound like a typically corny acceptance speech. In fact, that would be ridiculous: I am just at the beginning, very far from having accomplished the mission or from even knowing with at least a minimal amount of certainty that success is inevitable. But, regardless of having the entire mission ahead of me, I want my first task to be to thank those who have in one way or another helped me to get here --by giving me courage and encouragement, by graciously sharing their knowledge with me, or by simply employing me and thus creating a safe haven where I could grow and learn. All those hyper-linked names in the first paragraph have been fundamental in their help, and I want to say public thanks to them. 

And who's going to guide me in this new venture, of necessity more lonesome that the previous ones? I am hoping that past mentors and friends will be willing to continue in their  roles as advisors, and I am also hoping to add new guides for this fresh path, people with more experience in the world of micro-ISVs. And, finally, there are already a small number of entrepreneurs who have been kind enough to share their stories with the open public, and to whom I am now turning for advice, regardless of whether they speak to me directly or simply via their blogs and sites. In particular I to want end this post about the beginning by thanking Peldi Guilizzoni, of Balsamiq Studios, for being such an inspiration. I recently sent him a thank you note, which I think is OK to make public here, slightly edited:

Caro Peldi,

I know you must be overwhelmed by the amounts of email that you get, so I hope you get to read this. I just wanted to send you a brief note to thank you for your transparency and openness --I know it's good for business, but it's also refreshing, at a human level, to find somebody who puts decent values first and profits later. Your story (and your family's) has been very inspiring, and it would have been so had your adventure resulted in $50K in revenue instead of $2M. It's not the success the inspires me, but the decision you guys made to embark on this journey. In these times of pure greed and narcissism, I welcome and get inspired by your focus on balance, customer satisfaction, quality over margins, and in general by your desire to build a 'good, honest business'. And, on top of that, your humility about the path you're traveling makes it possible for others like me to relate to it as an equal. Thanks for those values, and for taking the time to share your story.

Thanks much also for recommending 'Growing a Business' in your blog. I've read it twice now and in it I've found the perfect articulation of the goals that I have for the small software business that I'm trying to create. I've spent many years motivated by the wrong things (and being quite successful at it! ;-), and now a cycle is ending and I want to turn around and go back to a more humane set of values, caring about *giving* something first and leaving *me, me, me* as a distant second, and rescuing the ideals and the search for excellence and quality in what I do that I used to have in younger years. In 'Growing a Business' I was delighted to find a successful businessman talking about a similar set of values --do something good. Thanks for that too.

It's not too much of a stretch to say that 'Nel mezzo del cammin di nostra vita / mi ritrovai per una selva oscura / ché la diritta via era smarrita', and that you and Paul Hawken have been my Virgils. Well, to be fair I'm just starting now and I have no guarantee that one day I'll manage to leave the fun Inferno of starting something new and get to see the nine circles of Paradiso, but I feel confident in these two who are guiding me ;-). Take care, good luck, and many thanks,

- Agustin Schapira
His path of openness, transparency, and humility has been very inspiring. Thanks, Peldi, and thanks very much to all for helping me up to here.

Now wish me luck!

Comments [0]