Mike McClurg

Posts Tagged ‘Stream Fusion

Creating Stream Combinators in Haskell’s Stream Fusion Library

leave a comment »

So I took a look at the Haskell Stream Fusion library the other day, and got the idea to write a new append combinator that would merge the two streams in sort order. This seemed simple enough to code directly using Streams, but my first instinct is always to write the code using lists, and then translate it into the more complicated syntax. Here’s what a sorting merge function looks like over lists:

merge :: Ord a => [a] -> [a] -> [a]
merge []     bs                 = bs
merge as     []                 = as
merge (a:as) (b:bs) | a < b     = a : merge as (b:bs)
                    | otherwise = b : merge (a:as) bs

We have two base cases where either one of the argument lists may be null, in which case we just return the other. For the recursive case, we just cons the lesser of the two list heads onto the rest of the list, and leave the other list head attached to its list in-place. Simple and elegant.

So the Stream version should be just as easy, right? Let’s see.

mergeS_wrong :: Ord a => Stream a -> Stream a -> Stream a
mergeS_wrong (Stream nexta sa0) (Stream nextb sb0) = Stream next (sa0, sb0)
    where
      next (sa0, sb0) =
          case (nexta sa0, nextb sb0) of
            (Done, sb) ->
                case sb of
                  Done        -> Done
                  Skip sb'    -> Skip    (sa0, sb')
                  Yield b sb' -> Yield b (sa0, sb')
            (sa, Done) ->
                case sa of
                  Done        -> Done
                  Skip sa'    -> Skip    (sa', sb0)
                  Yield a sa' -> Yield a (sa', sb0)
            (sa, sb) ->
                case sa of 
                  Done        -> Done -- shouldn't happen
                  Skip sa'    -> Skip    (sa', sb0)
                  Yield a sa' ->
                      case sb of
                        Done                    -> Done -- shouldn't happen
                        Skip sb'                -> Yield a (sa', sb')
                        Yield b sb' | a < b     -> Yield a (sa', sb0)
                                    | otherwise -> Yield b (sa0, sb')

Looks like a wordier version of the first. We take the first element of each stream, and use a case expression to check each of our cases. The first two base cases are a little longer this time because we can’t just return the other stream, but instead have to either Skip or Yield over the remainder of the Stream. In the third case, we must Skip over the first Stream until we Yield a value, and then do the same for the second stream. We compare the two values, Yield the lesser, and return the two remaining Streams.

The only problem is that this won’t compile. GHCi gives us the following error message:

*Main> :load "/home/mike/Projects/Haskell_SVN/NumWords.hs"
[1 of 1] Compiling Main             ( /home/mike/Projects/Haskell_SVN/NumWords.hs, interpreted )

/home/mike/Projects/Haskell_SVN/NumWords.hs:59:53:
    Could not deduce (Data.Stream.Unlifted (s, s1))
      from the context (Data.Stream.Unlifted s1)
      arising from a use of `Stream'

What’s this Data.Stream.Unlifted type? Turns out that our Stream data type is encapsulated by a universally quantified type s that is an instance of the hidden type class Unlifted. The standard Haskell pair type (,) isn’t, unfortunately, an exposed instance of this class. Which means that we can’t make a Stream out of a pair of Streams, as we did on the second line of code with Stream next (sa0, sb0).

Or so I thought. That is, until I realized (after much hand wringing) that the library did expose a data type that would allow us to use our own types — or, indeed, all of the standard Haskell types, such as pair. The type we need is

data L a = L a
instance Unlifted (L a) where
  expose (L _) s = s

Now we have a wrapper data type that acts as a dummy instance of class Unlifted! So (after about four hours of head scratching), we can make the following small changes to our code:

mergeS :: Ord a => Stream a -> Stream a -> Stream a
mergeS (Stream nexta sa0) (Stream nextb sb0) = Stream next (L (sa0, sb0))
    where
      next (L (sa0, sb0)) =
          case (nexta sa0, nextb sb0) of
            (Done, sb) ->
                case sb of
                  Done        -> Done
                  Skip sb'    -> Skip    (L (sa0, sb'))
                  Yield b sb' -> Yield b (L (sa0, sb'))
            (sa, Done) ->
                case sa of
                  Done        -> Done
                  Skip sa'    -> Skip    (L (sa', sb0))
                  Yield a sa' -> Yield a (L (sa', sb0))
            (sa, sb) ->
                case sa of 
                  Done        -> Done -- shouldn't happen
                  Skip sa'    -> Skip    (L (sa', sb0))
                  Yield a sa' ->
                      case sb of
                        Done                    -> Done -- Shouldn't happen
                        Skip sb'                -> Yield a (L (sa', sb'))
                        Yield b sb' | a < b     -> Yield a (L (sa', sb0))
                                    | otherwise -> Yield b (L (sa0, sb'))

All we had to do was wrap our Stream pairs in the type constructor L to give our Stream pairs access to “free” instance deriving from the Unlifted class. Easy? Well, once you notice that unassuming data L a = L a in the documentation. But hey, it sure beats trying to do something like this in C!

Written by mcclurmc

March 21, 2010 at 4:42 pm

Posted in Programming

Tagged with ,