Mike McClurg

OCaml Design Pattern: Easy functorization refactoring for unit testing

leave a comment »

I’ve recently stumbled upon a useful OCaml design pattern for functorizing an existing module, without changing the way existing clients of that module use it. This is really useful for stubbing out dependencies when you’re refactoring code for unit tests. Here’s a simplified example from a bit of code that I refactored today.

module Ovs = struct

  let vsctl args = call_cmd "ovs-vsctl" args

  let create_bond name bridges interfaces props =
    let prop_args = make_bond_properties name props in
    let iface_args = do_stuff_with interfaces prop_args in
    let bridge_args = do_other_stuff_with bridges in
    vsctl [ bridge_args; iface_args ]

end

The above code is loosely based on xen-api’s network daemon, a service which configures an XCP host’s networking. The above module is for the Open vSwitch backend, which uses the ovs-vsctl command line tool to convigure the vSwitch controller. I just implemented some new functionality in the make_bond_properties function (not shown above), and I want to test that ovs-vsctl command is being invoked properly.

I want to be able to test that the list of arguments we’re passing to the vsctl function is correct, but this function calls vsctl directly with the arguements, so I can’t “see” them in my test case. We could just split create_bond into create_bond_arguments and do_create_bond functions. But if we ever write unit tests for the other 20 functions that call vsctl, we’ll have to do the exact same thing for all of them. Instead, we’ll refactor the Ovs module so that we can pass in an alternative implementation of vsctl.

module Ovs = struct
  module Cli = struct
    let vsctl args = call_cmd "ovs-vsctl" args
  end

  module type Cli_S = module type of Cli

  module Make(Cli : Cli_S) =

    (* continue with original definition of Ovs *)
    let create_bond name bridge interfaces props =
      let prop_args = make_bond_properties name props in
      let per_iface_args = do_stuff_with interfaces prop_args in
      vsctl [ bridge; per_iface_args ]

  end

  include Make(Cli)

end

You can see that we’ve just inserted a few lines of code around the original Ovs module implementation. We moved the vsctl function into Ovs.Cli, and we moved the rest of the definition of Ovs into Ovs.Make(Cli : Cli_S). The neat bit is at the end, when we include Make(Cli) inside of module Ovs. This calls the Make functor with a module that contains the original definition of the vsctl function, and includes that in the Ovs definition. Now we have a functorized module, which we can customize in our test cases. Yet we haven’t changed the definition of the original Ovs module at all, so we don’t have to change any of the code that depends on the Ovs module! This trick also works on toplevel modules in *.ml files, too.

Now our testcase is easy to write:

module TestCli = struct
  let vsctl args = assert_args_are_correct args
end

let test_create_bond =
  let module Ovs = Ovs.Make(TestCli) in
  let test_properties = ... in
  Ovs.create_bond "bond0" "xapi0" ["eth0"; "eth1"] test_properties

So the test is actually performed by the vsctl command, which we injected into the Ovs module using the Ovs.Make functor. Easy!

One more thought: I had considered using first class modules to inject test dependencies into a module like Ovs. This would have worked as well, but I couldn’t think of a good way to do it that wouldn’t have required rewriting all the Ovs call sites (I didn’t think to hard about it, so maybe it’s easier than I think). Also, first class modules are really meant for swapping dependencies at runtime, more akin to what Java programmers mean when they say “dependency injection.” In this case, we really would rather have the dependency injection done at compile time, so there’s no real benefit to using first class modules.

Written by mcclurmc

December 18, 2012 at 7:31 pm

Posted in OCaml, Unit testing

Getting started with Go and Juju

with one comment

So this title probably doesn’t even parse correctly if you haven’t heard of Google’s Go language our Ubuntu’s Juju service orchestration software. I’ve been toying with the idea of writing a Juju service provider for the XenAPI. This would allow people to use Juju to provision services on a pool of XCP or XenServer hosts. Since the Juju team have recently completed the rewrite of Juju from Python to Go, this gives me the perfect reason to teach myself Go.

And I had the perfect opportunity to learn both Go and Juju at last week’s Ubuntu Developer Summit in Copenhagen, where I got to meet most of the Juju core team. After one of the Juju planning meetings, Roger Peppe gave me a hands on intro to setting up a Go environment and building Juju. Now I’ll share the notes I took, to help would-be Go/Juju hackers to get started. (This isn’t an introduction to the Go language, by the way. This is just instructions for setting up the tools necessary for compiling Juju. Check out the Go Tour for help learning Go.)

The first thing to do, of course, is to install Go:

# Install go
sudo aptitude install golang golang-mode

I haven’t been able to get Vim to highlight my Go files, but Emacs and golang-mode work. I’ve managed to get go-mode confused a bit, though, including one weird hang that forced me to kill Emacs.

It turns out that Go isn’t just a compiler, it’s an entire packaging system too. So before you start using Go, you’ll need to set up your GOPATH environment variable to point to some directory (any directory will do; I put mine in ~/go). This is where all your Go source code and compiled binaries will live.

# Set up go environment (add this to .bashrc)
export GOPATH=~/go
export PATH=${GOPATH}/go/bin:${PATH}

Here’s where the package management system comes in. Instead of doing a ‘bzr clone lp:juju-core’ like I imagined I would, instead you tell Go to go fetch your source code:

# Get juju-core and its dependencies
go get -v launchpad.net/juju-core/...

This will tell Go to go to Launchpad and fetch everything under the juju-core path. It will even use bzr to clone the source repo, so you can make changes and commit back to Launchpad. Note the use of the ‘…’ wildcard feature. This will match any path after juju-core/. You could also put the ‘…’ characters in the middle of a path, as in foo/…/bar, and it will match paths such as foo/baz/bar and foo/boz/buz/bar. This reminds me a bit of Xpath’s ‘//’ search feature.

Juju-core isn’t the only thing that Go downloaded for us. Go ahead and check out your $GOPATH/src directory. There should be a directory called launchpad.net which contains juju-core, but there will also be another directory called labix.org. Where did this come from? It turns out that Go will scan the source files it downloads looking for external dependencies. If those dependencies are prefixed with a network address, go will try to download those as well.

I’m not sure how I feel about this, to be honest. It’s pretty cool that it can just go out and find dependencies. But how does it deal with version pinning? I’m sure there’s a good answer to this out there somewhere. (Update: Rog Peppe tells me that the standard way to do this is to encode versions in the library path. I wonder if this would work with a git branch on Github…)

The ‘go get’ command both fetched the Juju source and built and installed it. To see how to build Go source on your own, we’ll nuke the compiled libraries and tell Go to build and install them again.

# Build and install Juju
rm -rf ${GOPATH}/pkg/linux_amd64/launchpad.net/
cd ${GOPATH}/src/launchpad.net/juju-core
go install ./...

One of the best things about Go (and the primary reason Rob Pike developed Go, actually) is the fast compile times. That command above took less than 9 seconds on my laptop. That’s all it takes for Go to compile all the juju-core source code, and all its external depedencies. Doing a quick and dirty ‘find ${GOPATH}/src -name “*.go” | xargs sed ‘/^\s*\/\//d;/^\s*$/d’ | wc -l’ gives me 72135 lines of code, so 9 seconds sounds pretty good.

Go also has a great online help feature. Not only can you get help on the core tools through man pages, but you can get library docs — even for libraries you just downloaded — using the ‘godoc’ command. Here’s documentation for the entire ec2 package:

# Check out documentation for a package
go doc launchpad.net/.../ec2

And here’s documentation for a particular exported name of that package:

# Documentation for a type or function (notice we call godoc directly)
godoc launchpad.net/goamz/ec2 Instance

If you’d rather read your package docs on the web, a great resource is http://go.pkgdoc.org/, which has docs for the standard library as well as many, many third party libraries.

Go will also format your code for you. You can run ‘go fmt <file.go>’ to format a file in place, or use ‘M-x gofmt’ in Emacs. I really should bind that to a shortcut.

# Reformat your code
go fmt <file.go>

And on top of all that, Go also has a built in test framework. Files which end with _test.go are special and are considered test cases. You can use the command ‘go test’ to build and execute these tests. There is a built in unit testing library which I won’t cover here (because I don’t know now to use it yet).

# Run tests in a given directory
go test

So that’s all you need to get started with Go development for Juju. For a good tutorial on bootstrapping Juju, you can take a look at Juju’s Getting Started document.

I’ll hopefully follow this blog up with a status report on my XenAPI service provider for Juju, and how I’ve modified Dmitry Maksimov’s XMLRPC library to work with the XenAPI.

Written by mcclurmc

November 5, 2012 at 6:07 am

Posted in Programming

Tagged with , ,

Creating Stream Combinators in Haskell’s Stream Fusion Library

leave a comment »

So I took a look at the Haskell Stream Fusion library the other day, and got the idea to write a new append combinator that would merge the two streams in sort order. This seemed simple enough to code directly using Streams, but my first instinct is always to write the code using lists, and then translate it into the more complicated syntax. Here’s what a sorting merge function looks like over lists:

merge :: Ord a => [a] -> [a] -> [a]
merge []     bs                 = bs
merge as     []                 = as
merge (a:as) (b:bs) | a < b     = a : merge as (b:bs)
                    | otherwise = b : merge (a:as) bs

We have two base cases where either one of the argument lists may be null, in which case we just return the other. For the recursive case, we just cons the lesser of the two list heads onto the rest of the list, and leave the other list head attached to its list in-place. Simple and elegant.

So the Stream version should be just as easy, right? Let’s see.

mergeS_wrong :: Ord a => Stream a -> Stream a -> Stream a
mergeS_wrong (Stream nexta sa0) (Stream nextb sb0) = Stream next (sa0, sb0)
    where
      next (sa0, sb0) =
          case (nexta sa0, nextb sb0) of
            (Done, sb) ->
                case sb of
                  Done        -> Done
                  Skip sb'    -> Skip    (sa0, sb')
                  Yield b sb' -> Yield b (sa0, sb')
            (sa, Done) ->
                case sa of
                  Done        -> Done
                  Skip sa'    -> Skip    (sa', sb0)
                  Yield a sa' -> Yield a (sa', sb0)
            (sa, sb) ->
                case sa of 
                  Done        -> Done -- shouldn't happen
                  Skip sa'    -> Skip    (sa', sb0)
                  Yield a sa' ->
                      case sb of
                        Done                    -> Done -- shouldn't happen
                        Skip sb'                -> Yield a (sa', sb')
                        Yield b sb' | a < b     -> Yield a (sa', sb0)
                                    | otherwise -> Yield b (sa0, sb')

Looks like a wordier version of the first. We take the first element of each stream, and use a case expression to check each of our cases. The first two base cases are a little longer this time because we can’t just return the other stream, but instead have to either Skip or Yield over the remainder of the Stream. In the third case, we must Skip over the first Stream until we Yield a value, and then do the same for the second stream. We compare the two values, Yield the lesser, and return the two remaining Streams.

The only problem is that this won’t compile. GHCi gives us the following error message:

*Main> :load "/home/mike/Projects/Haskell_SVN/NumWords.hs"
[1 of 1] Compiling Main             ( /home/mike/Projects/Haskell_SVN/NumWords.hs, interpreted )

/home/mike/Projects/Haskell_SVN/NumWords.hs:59:53:
    Could not deduce (Data.Stream.Unlifted (s, s1))
      from the context (Data.Stream.Unlifted s1)
      arising from a use of `Stream'

What’s this Data.Stream.Unlifted type? Turns out that our Stream data type is encapsulated by a universally quantified type s that is an instance of the hidden type class Unlifted. The standard Haskell pair type (,) isn’t, unfortunately, an exposed instance of this class. Which means that we can’t make a Stream out of a pair of Streams, as we did on the second line of code with Stream next (sa0, sb0).

Or so I thought. That is, until I realized (after much hand wringing) that the library did expose a data type that would allow us to use our own types — or, indeed, all of the standard Haskell types, such as pair. The type we need is

data L a = L a
instance Unlifted (L a) where
  expose (L _) s = s

Now we have a wrapper data type that acts as a dummy instance of class Unlifted! So (after about four hours of head scratching), we can make the following small changes to our code:

mergeS :: Ord a => Stream a -> Stream a -> Stream a
mergeS (Stream nexta sa0) (Stream nextb sb0) = Stream next (L (sa0, sb0))
    where
      next (L (sa0, sb0)) =
          case (nexta sa0, nextb sb0) of
            (Done, sb) ->
                case sb of
                  Done        -> Done
                  Skip sb'    -> Skip    (L (sa0, sb'))
                  Yield b sb' -> Yield b (L (sa0, sb'))
            (sa, Done) ->
                case sa of
                  Done        -> Done
                  Skip sa'    -> Skip    (L (sa', sb0))
                  Yield a sa' -> Yield a (L (sa', sb0))
            (sa, sb) ->
                case sa of 
                  Done        -> Done -- shouldn't happen
                  Skip sa'    -> Skip    (L (sa', sb0))
                  Yield a sa' ->
                      case sb of
                        Done                    -> Done -- Shouldn't happen
                        Skip sb'                -> Yield a (L (sa', sb'))
                        Yield b sb' | a < b     -> Yield a (L (sa', sb0))
                                    | otherwise -> Yield b (L (sa0, sb'))

All we had to do was wrap our Stream pairs in the type constructor L to give our Stream pairs access to “free” instance deriving from the Unlifted class. Easy? Well, once you notice that unassuming data L a = L a in the documentation. But hey, it sure beats trying to do something like this in C!

Written by mcclurmc

March 21, 2010 at 4:42 pm

Posted in Programming

Tagged with ,

Java Concurrency Annotations

leave a comment »

I’ve been reading a series of papers by Chandrasekhar Boyapati on extensions to the Java type system. I found his papers on ensuring race-free programs by specifying that objects are either immutable thread local, or referenced by a unique pointer. There’s also the paper A Type and Effect System for Deterministic Parallel Java, from OOPSLA 2009. I’m fascinated by the idea of creating a type system that could ease the burden of writing threaded program, and this seems like a really promising idea to me.

I’d like to combine this approach to concurrency with Hybrid Type Checking. While I’d prefer to do as much at compile time as possible, I have a suspicion that we’ll always need to do some locking and unlocking at compile time, and that a system using both static types and runtime contracts might be our best bet.

I’m taking a stab at implementing these ideas in Java — but instead of modifying the compiler itself, I’ll be writing an annotation processor that will run before the compiler. The idea is that we could extend the Java type system using annotations. We could even go so far as to generate code at compile time that could do either runtime contract checking, or even lock and thread management. This is similar to the project Java Defensive Programming. If I can get this project off the ground, I’ll have to see if Federico would like to incorporate some of my code in his project.

Written by mcclurmc

February 12, 2010 at 3:21 pm

Posted in Projects, Thoughts

Tagged with , , ,

XML for Resumes

leave a comment »

I hate writing resumes because there are no good tools for the job. I would prefer to use an open source word processor, such as Open Office, but companies often ask for MS Word format, which OO doesn’t do particularly well. I’ve thought about LaTeX, but unless you’re doing a lot of math that’s kind of overkill. Besides, I want to be able to keep the content of my resume separate from the formatting of the resume. I also want good revision control, and I want to be able to select content for my resume based on the particular job I’m applying for (such as leaving off my DoD clearance from non-DoD job applications).

All of this sounds like a job for XML – though I’m sure some would find this more overkill than LaTeX. But with a simple XML language, or perhaps even XHTML with class and span tags, I think I can make this happen. There are already two open “standards” for the job, XML Resume and hResume microformats, but I use the word “standards” very loosely. The XML Resume project on SourceForge hasn’t been updated since 2005, and I’m not convinced that the microformats solution completely solves the problem of keeping content and formatting orthogonal – though it gives us the great benefit of having a fully indexable online resume.

My idea is to use either XML Resume or hResume (or some bastard of the two) to write the content of the resume, but then use an OO plugin to import the resume language into OO, where we can use it’s great formatting tools. If I can get this rolling, we could also write a corresponding plugin for MS Word, so that I can send off resumes to Microsoft shops.

Now that I think about it, this is basically the model-view-controller pattern. The model is the XML representation of the resume, and all the different data points that you may want to include at some point in a real resume. The word processor is the view, where you can make your resume look as pretty as you want. The controller is the plugin that allows you to have your pretty resume template backed by data in the XML file – and, if the plugin does what I want it to do, display certain bullet points while hiding others, change the order of items, etc.

I’d really like to get this project started, but every time I think about it it’s because I need to work on my resume. And so I find myself in the programmer’s endless dilemma: do I build a tool to solve the problem for me, or do I just solve the problem? My resume is looking pretty good, so I hope I’ll find some time to work on this plugin.

Written by mcclurmc

February 9, 2010 at 7:01 pm

Posted in Projects, Thoughts

Tagged with ,

Emacs!

with 2 comments

I love Emacs. I’ve been using it ever since I took my first programming language theory class in college. I had played around with it before then, but it took my professor’s recommendation and half a class worth of a tutorial and I was hooked.

We studied Scheme in that class, which is why my prof recommended that we all use Emacs. Scheme is a variant of Lisp, which is the language that Emacs is built around. And Elisp, as it’s called, is a great embedded language for customizing Emacs. You could think of it as Visual Basic for Applications, which is Microsoft’s language for customizing Office products, but much more simple and powerful to use (as long as parentheses don’t scare you).

I’ve only got one or two friends that might consider themselves to be Emacs gurus, and one of them wrote me the other day with a challenge. “I believe this problem is trivial,” he said, which is never a good sign, “but I’ve only had 20 minutes to think about and I’ve got to run.” He had been working on an Emacs library to help him configure a massive Ant build script. Part of his build process involved him logging into numerous test and dev machines, and he wanted to automate that in Emacs. He had written a list of interactive functions that would allow him to rlogin into each machine by typing M-x machine (M stands for Alt, or Meta, in Emacs, and is used to specify commands from the keyboard). These functions were each defined with syntax like the following:

(defun machine1 ()
  (interactive)
  (rlogin "machine1" "*machine1*"))

 

This list of functions had grown quite a bit in the months since he implemented this feature, and he wanted to find a way to automatically define a function that would log him into a specified machine by simply typing M-x machine. Easy, right?

A first attempt

My first approach to this problem was to create an association list of functions and their names. We would create a single interactive function that would select and run the appropriate generated function.

(defun my-login (fn)
  (interactive "sWhich function? ")
  (funcall
    (cdr (assoc (read fn) my-fnassoc)))))

There are a few things going on in this function. The (interactive "sWhich function? ") statement tells Emacs that this function can be called interactively using the M-x my-login syntax. The strange string “sWhich function? ” is just a format string telling Emacs to prompt the user for a string argument (the “s”) with the prompt “Which function? “. If this were a multivariate function, we would simply break each format string with a “\n” character in order to prompt for each of our arguments. The read function just turns our argument string into a symbol, and the call (cdr (assoc (read fn) fnassoc)) looks the symbol up in the association list my-fnassoc. funcall executes the returned function. Now we just need to build my-fnassoc.

(defun mkfunls (name)
  (let* ((host (prin1-to-string name))
         (buff (format "*%s*" host)))
    `(,name . (lambda () (rlogin ,host ,buff)))))

This function creates a single association pair of the form (host . login-function). Since our name parameter is a symbol, we need to convert it to a string using prin1-to-string. The Emacs rlogin command allows us to specify the name of the buffer in which we spawn the remote shell. We can use the format function to create a standard buffer name for each rlogin session we create.

The final line of mkfunls takes advantage of Lisp’s backquoting feature. The syntax looks a little strange at first, but if you can put up with Lisp’s other oddities then there’s no sense not learning about backquoting too. Quoting in Lisp, accomplished with either the quote special form or the abbreviation ' (a single apostrophe). Quoting prevents evaluation of the form being quoted, so evaluating (+ 1 2) returns the value 3, while evaluating (quote (+ 1 2)), or the equivalent '(+ 1 2), returns the value (+ 1 2) instead.

Backquoting accomplishes the same thing, but allows the programmer to selectively specify which parts of a list are to be evaluated and which parts aren’t. It’s called backquoting because we use a backwards apostrophe (` — found under the tilde) instead of quote or the regular apostrophe. We select the forms we want to evaluate using a comma. So the line `(,name . (lambda () (rlogin ,host ,buff))) evaluates the name, host, and buff variables but quotes all the rest of the symbols. So the call (mkfunls 'machine1) evaluates to (machine1 lambda nil (rlogin "machine1" "*machine1*")). Now all that’s left is to create our host/rlogin association list and we’ll be able to log in to remote hosts with a single command.

(setq my-fnassoc (mapcar 'mkfunls '(m1 m2 m3)))

This creates an association list with the host names m1, m2, m3 and sets the variable my-fnassoc. The function mapcar applies it’s first argument, in the case the function mkfunls, to it’s second argument, which must be a list.

Now when we type M-x my-login we receive the prompt “Which host?”, to which we can reply with any of the hosts we specified in the my-fnassoc list above.

But this doesn’t quite solve my buddy’s problem, does it? His list of login functions, while a little cumbersome to maintain, defined interactive functions at the top level, which allowed him to run M-x machine1 instead of the more indirect call to M-x my-login above.

A better solution

What we want is to automatically generate named, interactive functions. My initial thought was that this was a perfect task for macros, but it turns out that we don’t even have to bother with macros to make this work. Instead, we’ll use the backquoting syntax from above to build up defuns and then evaluate them.

(defun mkdefun (name)
  (let* ((host (prin1-to-string name))
         (buff (format "*%s*" host)))
    (eval
     `(defun ,name () (interactive) (rlogin ,host ,buff)))))

This is very similar to our mkfunls function above. Instead of returning an association list, we’re evaluating a defun. We use the backquote syntax to selectively evaluate the name, host, and buff variables. eval then evaluates the defun for us and creates a new top level definition. We can then map over a list of function names like we did before:

(mapc 'mkdefun '(m1 m2 m3))

mapc is like mapcar, except that it doesn’t return the a list of the results of the function evaluations. This is useful when we only care about a function’s side effect and not its results.

So now we have a simple method to automatically generate repetitive Emacs commands. All we have to do is append new function names to the list in the mapc command and we have a new function!

Written by mcclurmc

January 19, 2010 at 1:43 am

Posted in Programming

Tagged with ,

Follow

Get every new post delivered to your Inbox.