Posts from "July 2010"

Opower Labs

Maven-izing Google’s Data Client Java Library

  • By Tom Vaughan
  • July 27, 2010

OPOWER’s first Innovation Day was a lot of fun.  I was waiting about 30 minutes for my primary project (building an AWS machine image for our hudson slave) to complete and I got sucked in to a different project involving automatically downloading google analytics for all the traffic on all the websites we host for our clients.

Clicking around google’s well-documented “get started” guides, it looked like they haven’t made their client JARs available in any MVN repos anywhere.  There is a (not-google-sponsored) source forge project that points to a sonatype repo, but instead of adding another 3rd party repo to our small but growing list, I figured it’d be easier for small proof of concept purposes to just manually install the google JARs in OPOWER’s repo (thereby making them accessible to all our developer’s boxes).

It quickly became clear that the number of JARs google ships with makes it faster and less error prone to script the mvn repo installation than doing manual command line copy & pasting.  Lo, my gift to the world:

  • Grab latest JARs from the “gdata-java” link here: http://code.google.com/p/gdata-java-client/downloads/list
  • Unzip on the server hosting your company’s mvn repo
  • cd down in to ./gdata/java/lib
  • Make a file ‘installall.pl’ in that lib directory:
  • #!/usr/bin/perl -w
    
    my @jars = `ls *.jar`;
    chomp(@jars);
    foreach my $jar (@jars) {
      if($jar =~ /(.*)-(d.d).jar/) {
        my $cmd = "./install.pl --artifactId $1 --version $2 --jar $jar";
        open(CMD, "$cmd|") or die "Could not execute $cmd $!";
        while(<CMD>) {
           print $_;
        }
       close(CMD);
      }
    }
  • Still in that directory, make another file ‘install.pl’:
  • #!/usr/bin/perl -w
    
    use strict;
    use warnings;
    use Getopt::Long;
    
    sub usage();
    
    my $artifactId;
    my $version;
    my $jar;
    GetOptions('artifactId=s' => $artifactId ,
               'version=s' => $version,
               'jar' => $jar);
    die usage() unless ($artifactId && $version && $jar);
    
    my $cmd = "mvn deploy:deploy-file " .
          "-DgroupId=com.google.gdata " .
          "-DartifactId=$artifactId " .
          "-Dversion=$version " .
          "-Dfile=$jar " .
          "-Dpackaging=jar " .
          "-DgeneratePom=true " .
          "-Durl=file:///opt/mvn_repo " .  <---  change this for your env
          "-Drepository=opower_local";  <--- change this for your env
    print "executing command = $cmdn";
    $cmd = "date";
    open(CMD, "$cmd|") or die "Could not exec $cmd $!";
    while(<CMD>) {
        print $_;
    }
    close(CMD);
    
    sub usage() {
        print "Usage: ./install.pl [--artifactId artifactId] [--version version] [--jar jarfile]n";
        print "Example: ./install.pl --artifactId gdata-webmastertools --version 1.0 --jar gdata-webmastertools-1.0.jarn";
        exit 1;
    }
  • Don’t forget to chmod 755 *.pl
  • Then just run ./installall.pl and all those google JARs should get installed in your repo

That assumes the “mvn” executable was in your path and that you run the scripts from within the same directory as all the JARs, but it’s easily modified for whatever your situation may be.

Note that relative to ./lib there are 2 jars in ../deps/ that you should also install in your repo because the google JARs to avoid ClassNotFoundExceptions in some runtime code paths.

Once in, you should be able to boot strap a mvn client project against a Google Data source with a pom not too dissimilar from:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <artifactId>foo</artifactId>
  <packaging>jar</packaging>
  <version>1.0.0-SNAPSHOT</version>
  <name>Google Client Example</name>

  <scm>
    <developerConnection>foo</developerConnection>
  </scm>

  <dependencies>
    <dependency>
        <groupId>com.google.gdata</groupId>
        <artifactId>gdata-core</artifactId>
        <version>1.0</version>
    </dependency>
    <dependency>
        <groupId>com.google.gdata</groupId>
        <artifactId>gdata-analytics</artifactId>
        <version>2.1</version>
    </dependency>
    <dependency>
        <groupId>com.google.gdata</groupId>
        <artifactId>gdata-client</artifactId>
        <version>1.0</version>
    </dependency>
    <dependency>
        <groupId>com.google</groupId>
        <artifactId>google-collect</artifactId>
        <version>1.0-rc1</version>
    </dependency>
  </dependencies>
</project>
Read More
Opower Labs

Heisenberg’s Key Performance Indicators

  • By Tom Vaughan
  • July 26, 2010

A picture of Heisenberg.I was trolling trawling through reddit waiting for a JAR to compile and found this blog post.  That blog made a couple claims that had crossed my mind a couple months ago when our engineering director solicited  feedback about using Key Performance Indicators for the dev team.  The blog post’s point wasn’t so much that the act of measuring a team changes its performance but rather that when the team is aware of what’s being measured, it naturally “cheats” in ways to maximize the measured value.  The first comment on the blog post alerted me to a law I hadn’t known about, but which sounds pretty intuitively correct:

Goodhart’s Law: Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes

Here at OPOWER, developers and product managers don’t deal with KPIs on a day-to-day basis….I think it’s mostly a management-level “let’s starting capturing some numbers over several months and see if there’s anything out-of-whack” kind of thing.

In terms of development KPIs, some of the things we thought it might be interesting to collect included:

  • story points delivered per iteration
  • % of story points (aggregate) done in an iteration that were planned at the beginning
  • % QA coverage of production (automated)
  • etc.

The “story points delivered per iteration” is, I think, precisely the kind of thing Goodhart was talking about when he made up his law.  That said, we are a business and we are expected to work and be productive, so it isn’t exactly a satisfying for a VP or CEO to hear from their dev team “sorry, you can’t measure us because we’ll just skew the measurement number to please you.”  So what’s management to do with their IT cabal?

Assuming you could normalize what story points mean across different teams, and assuming you can account for “point inflation” and assuming you accurately tracked vacations and network outages and late requirements and all the other stuff that goes along with uncertainty in exactly how much gets delivered in an iteration, must you also assume that the team gradually skews the number to meet the goal?  What if the goal was kept secret?  What if the team didn’t know they were being measured?  That’s hardly a way to foster a healthy relationship between management and product development.A picture of Walt from Breaking Bad

One way around this dilemma is to find some totally objective metric in the software engineering process.  In other words, if you could apply KPIs to “number of sprockets coming off the assembly line such that each sprocket is within .1% tolerance,” what is a similar kind of thing in our iteration process we could measure?  I.e., one that doesn’t easily fall prey to inflation or manipulation?  If you can find something like that, lemme know.  Until then, I won’t be holding my breath.

Another possible option would be to go ahead and publicly measure highly subjective KPIs anyway and ask the measurees (us) to be as objective as possible when measuring.  I.e., fly in the face of Goodhart’s Law.  After all, it’s just a Law.  Like De Morgan’s Law, except maybe with a bit more wiggle room.

Read More