Posts Tagged ‘Stream Fusion’
Creating Stream Combinators in Haskell’s Stream Fusion Library
So I took a look at the Haskell Stream Fusion library the other day, and got the idea to write a new append
combinator that would merge the two streams in sort order. This seemed simple enough to code directly using Streams, but my first instinct is always to write the code using lists, and then translate it into the more complicated syntax. Here’s what a sorting merge function looks like over lists:
merge :: Ord a => [a] -> [a] -> [a] merge [] bs = bs merge as [] = as merge (a:as) (b:bs) | a < b = a : merge as (b:bs) | otherwise = b : merge (a:as) bs
We have two base cases where either one of the argument lists may be null, in which case we just return the other. For the recursive case, we just cons
the lesser of the two list heads onto the rest of the list, and leave the other list head attached to its list in-place. Simple and elegant.
So the Stream version should be just as easy, right? Let’s see.
mergeS_wrong :: Ord a => Stream a -> Stream a -> Stream a mergeS_wrong (Stream nexta sa0) (Stream nextb sb0) = Stream next (sa0, sb0) where next (sa0, sb0) = case (nexta sa0, nextb sb0) of (Done, sb) -> case sb of Done -> Done Skip sb' -> Skip (sa0, sb') Yield b sb' -> Yield b (sa0, sb') (sa, Done) -> case sa of Done -> Done Skip sa' -> Skip (sa', sb0) Yield a sa' -> Yield a (sa', sb0) (sa, sb) -> case sa of Done -> Done -- shouldn't happen Skip sa' -> Skip (sa', sb0) Yield a sa' -> case sb of Done -> Done -- shouldn't happen Skip sb' -> Yield a (sa', sb') Yield b sb' | a < b -> Yield a (sa', sb0) | otherwise -> Yield b (sa0, sb')
Looks like a wordier version of the first. We take the first element of each stream, and use a case expression to check each of our cases. The first two base cases are a little longer this time because we can’t just return the other stream, but instead have to either Skip
or Yield
over the remainder of the Stream. In the third case, we must Skip
over the first Stream until we Yield
a value, and then do the same for the second stream. We compare the two values, Yield
the lesser, and return the two remaining Streams.
The only problem is that this won’t compile. GHCi
gives us the following error message:
*Main> :load "/home/mike/Projects/Haskell_SVN/NumWords.hs" [1 of 1] Compiling Main ( /home/mike/Projects/Haskell_SVN/NumWords.hs, interpreted ) /home/mike/Projects/Haskell_SVN/NumWords.hs:59:53: Could not deduce (Data.Stream.Unlifted (s, s1)) from the context (Data.Stream.Unlifted s1) arising from a use of `Stream'
What’s this Data.Stream.Unlifted
type? Turns out that our Stream data type is encapsulated by a universally quantified type s
that is an instance of the hidden type class Unlifted
. The standard Haskell pair type (,)
isn’t, unfortunately, an exposed instance of this class. Which means that we can’t make a Stream out of a pair of Streams, as we did on the second line of code with Stream next (sa0, sb0)
.
Or so I thought. That is, until I realized (after much hand wringing) that the library did expose a data type that would allow us to use our own types — or, indeed, all of the standard Haskell types, such as pair. The type we need is
data L a = L a instance Unlifted (L a) where expose (L _) s = s
Now we have a wrapper data type that acts as a dummy instance of class Unlifted
! So (after about four hours of head scratching), we can make the following small changes to our code:
mergeS :: Ord a => Stream a -> Stream a -> Stream a mergeS (Stream nexta sa0) (Stream nextb sb0) = Stream next (L (sa0, sb0)) where next (L (sa0, sb0)) = case (nexta sa0, nextb sb0) of (Done, sb) -> case sb of Done -> Done Skip sb' -> Skip (L (sa0, sb')) Yield b sb' -> Yield b (L (sa0, sb')) (sa, Done) -> case sa of Done -> Done Skip sa' -> Skip (L (sa', sb0)) Yield a sa' -> Yield a (L (sa', sb0)) (sa, sb) -> case sa of Done -> Done -- shouldn't happen Skip sa' -> Skip (L (sa', sb0)) Yield a sa' -> case sb of Done -> Done -- Shouldn't happen Skip sb' -> Yield a (L (sa', sb')) Yield b sb' | a < b -> Yield a (L (sa', sb0)) | otherwise -> Yield b (L (sa0, sb'))
All we had to do was wrap our Stream pairs in the type constructor L
to give our Stream pairs access to “free” instance deriving from the Unlifted
class. Easy? Well, once you notice that unassuming data L a = L a
in the documentation. But hey, it sure beats trying to do something like this in C!