Return to Travis Wilson's resume

Pensieve

The technology

  • Log listener: lightweight Java servlets distributed across high-availability Tomcat clusters
  • Client libraries which sent logs to the listener: ActionScript, Javascript, PHP, and Java
  • Data storage: a blend of local filesystem, hdfs, Vertica (columnar database), and MySQL
  • Analysis ETL: Various scheduled transformations in pig, R, SQL
  • Analysis website: Django (Python) and Apache server-side. Javascript + jQuery and d3 in the browser.
  • The story

    This is the internal product I ran for Disney, so I can't show it to you. Like all work I've done for hire, this was someone else's baby first, and I adopted it. But this one in particular was a diamond in the rough.

    The gem of an idea was already there: an online SQL client. Expose big databases, let the user write query code, serve results, publish a link to the query. The power was in the published link. When a user followed the link, he not only saw the query result but he saw the code that generated it. He could modify the code on the page and run it again. An author could also parameterize her query to make it more reusable; the web app would prompt the user for param values and substitute them into the query.

    Quietly, a community was growing. Analysts developed queries and shared them across the company, improving the queries over time. Sometimes our central analysts would optimize a query written by a game team, and sometimes a game team would bring insight to a general query. For me, the network of collaborating users was the product. I wanted to cultivate it by improving the online software.

    Most analysis suites don't encourage their users to learn by reading others' work, tinker with the models, break things, and ultimately build upon the work of peers. We did. Building and responding to that community -- providing the user interface features and programmatic interface features to help them -- is what I became proud of. For example:

  • I provided URLs to embed charts and tables in users' own apps.
  • I made the parameterization typesafe and easier to use.
  • I tried a few ways of making good content easier to find. (No clear winner there.)
  • I profiled query performance and flagged broken queries.
  • A legacy (read: super-privileged) customer was opening a direct connection to the database and inserting data; I got him to knock it off by helping him with the transition.

    In a special point of pride, I merged in another web-based analytic tool. This separate tool used a different query language, returned only time-series data, and displayed it with an interactive GUI. But the basic model was the same as my main product. By combining them, the user base of each product gained the benefits of the other. (Bonus advantage: one product is cheaper to maintain than two.)

    Another great thing about that product is it was part of a greater ecosystem, which included logging libraries and ETL jobs. The players in that system trusted each other. I provided customer support for logging libraries, maintained the high-traffic logging servers, and wrote the API for the message bus that fed the databases. It's surprising how much value there is in a custom API that does a very standard thing. Knowing your customers can go a long way.