Saturday, March 29, 2008

Returning '(()) Considered Difficult

Suppose you want a function to find all subsequences of a list with a certain length. For instance, the subsequences of (1...5) with length 2 are as follows.
(1 2) (1 3) (1 4) (1 5)
(2 3) (2 4) (2 5)
(3 4) (3 5)
(4 5)
This is not a red-herring example cooked up just to make Java look bad. This actually came up at work! I did something along these lines. Yuck.
private static <T> List<List<T>> subn(int n, List<T> li) {
List<List<T>> ret = new ArrayList<List<T>>();
if (n == 0) {
ret.add(new ArrayList<T>());
return ret;
}
if (li.isEmpty()) {
return ret;
}
T x = li.get(0);
List<T> xs = li.subList(1, li.size());
for (List<T> sub : subn(n-1, xs)) {
sub.add(0, x);
ret.add(sub);
}
ret.addAll(subn(n, xs));
return ret;
}
However, that was not the way my thoughts originally took shape. I think in Lisp.
(def subn (n li)
(if (is 0 n) '(())
(no li) '()
(join (map [cons (car li) _] (subn (- n 1) (cdr li)))
(subn n (cdr li)))))
Then I moved it to Haskell for greater clarity.
subn 0 _      = [[]]
subn _ [] = []
subn n (x:xs) = map (x:) (subn (n-1) xs) ++ subn n xs
With great pain, I converted those three lines of Haskell into the ugly splatter of Java you saw above. Later on, I started playing around with Scala and gradually compacted the Java into something strikingly similar to its Haskell ancestor.
def subn[T](n: int, li : List[T]) : List[List[T]] = (n,li) match {
case (0,_) => List(Nil)
case (_,Nil) => Nil
case (_,x::xs) => (subn(n-1, xs) map (x :: _)) ::: subn(n, xs)
}
The title of today's post refers to the inordinate amount of trouble it is to simply return "The list containing the empty list" in Java. I tried such monstrosities as these:
Collections.singletonList(Collections.EMPTY_LIST) // List<List>
Arrays.asList(Collections.emptyList()) // List<List<Object>>
Collections.singletonList(new ArrayList<T>()) // List<ArrayList<T>>
Collections.singletonList((List<T>)new ArrayList<T>()) // List<List<T>>
The last option does give the correct type, but it causes a compiler warning and is not much less cumbersome than the naive approach, i.e.:
List<List<T>> ret = new ArrayList<List<T>>();
ret.add(new ArrayList<T>());
return ret;
So here's a challenge for you Java developers out there. What would you change about the admittedly sloppy Java code at the beginning of this post? I'm issuing an all points bulletin to the refactoring mavericks (Mike, James), the simple-design visionaries (Steve), and the speed-freak optimizers (Dan). I wouldn't even mind seeing a C# version (Ethan), or an improvement on the Haskell (Eric).

Incidentally, I know some people think recursion is hard to read. I'd love to see an iterative solution as well.

P.S. I realize that the comment boxes don't hold code blocks very well. Feel free to link to a solution.


Updates:
Based on your input, I am now happy to give three different ways of returning '(()) in one line. I am only including ways that would work in the context I gave (i.e. the inner list must be mutable). The last is my favorite because the others cause compiler warnings.
return Arrays.<List<T>>asList(new ArrayList<T>());
// Type safety : A generic array of ArrayList is created for a varargs parameter

return new ArrayList<List<T>>() {{add(new ArrayList<T>());}};
// The serializable class does not declare a static final serialVersionUID...
// Ricky points out that this is an "Eclipse warning".

return Collections.<List<T>>singletonList(new ArrayList<T>());
// No warnings :)
Eric has blessed us with a version in Factor (not to be confused with my aversion to Factor). He also added a Haskell version using list comprehensions to achieve better laziness properties.

Meanwhile, back in Javaland, Dan suggested using SortedSet, particularly the subSet method. This seems to be a dead end in this case, but is certainly a useful corner of the Collection hierarchy.

After briefly attaining List Enlightenment, Thomas produced an implementation in Java with no recursion. Aside from being much longer and totally opaque, it also has worse performance charactoristics. It essentially produces every subsequence and then filters by length, similar to the Haskell one-liner suggested in the comments.

Wednesday, March 26, 2008

Speaker Spotlight - Simon Peyton-Jones

Simon Peyton-Jones was one of the designers of the Haskell programming language. Here are links some of his presentations.

A Taste of Haskell
Introduction to Haskell with XMonad as an example
(video 1) (video 2) (slides)

Composing Contracts - An Adventure in Financial Engineering
Very interesting case study on domain specific languages
(video) (slides)

Nested Data Parallelism in Haskell
Beating FORTRAN is the name of the game!
(video) (slides)

Taming Effects - The Next Big Challenge
The virtues of purity in functional languages
(video) (slides)

Transactional Memory for Concurrency
OSCON 2007 Keynote
(video) (slides)

Programming in the Age of Concurrency: Software Transactional Memory
Interview with Simon and Tim Harris about Transactional Memory
(video)

Type-driven Testing
Demonstration of the Haskell library QuickCheck
(video) (slides)

Haskell and Erlang: Growing Up Together
At Erlang Factory 2009
(video)

Interview at QCon 2008
(video)

Saturday, March 1, 2008

Haskell Style Pattern Matching In Arc

Since I did the Fluent Interface saga, it's only fair that I try to push the boundaries of metaprogramming in language that actually has metaprogramming.

Consider the following Haskell code. We will be striving to imitate it as closely as possible using an Arc macro.
union xs [] = xs
union [] ys = ys
union xs@(x:xt) ys@(y:yt) | x < y = x : union xt ys
| y < x = y : union xs yt
| otherwise = x : union xt yt
First we will see how this function might traditionally be written.
(def union (< xs ys)
(if (no xs) ys
(no ys) xs
(with (x (car xs) xt (cdr xs)
y (car ys) yt (cdr ys))
(if (< x y) (cons x (union < xt ys))
(< y x) (cons y (union < xs yt))
(cons x (union < xt yt))))))
As a useful side-note, Arc's let and with forms come with destructuring. This can save us a whole line here, as well as make the code more readable (no more cryptic cars and cdrs).
(def union (< xs ys)
(if (no xs) ys
(no ys) xs
(with ((x . xt) xs (y . yt) ys)
(if (< x y) (cons x (union < xt ys))
(< y x) (cons y (union < xs yt))
(cons x (union < xt yt))))))
This isn't bad. If we were programming in the wild, we would leave well enough alone at this point. However, this is an experiment; we want it to look Haskellian, not Lispy.

We'll start with pcase a 10-line macro using almkglor's pattern matching library to yield something similar to Scala's match-case statement.
(def union (< xs ys)
(pcase `(,xs ,ys)
(xs ()) xs
(() ys) ys
((x . xt) (y . yt))
(if (< x y) (cons x (union < xt ys))
(< y x) (cons y (union < xs yt))
(cons x (union < xt yt)))))
Switching from pcase to hcase, the outer parens in the patterns go away and equals signs separate the patterns from their corresponding expressions.
(def union (< xs ys)
(hcase `(,xs ,ys)
xs () = xs
() ys = ys
(x . xt) (y . yt) =
(if (< x y) (cons x (union < xt ys))
(< y x) (cons y (union < xs yt))
(cons x (union < xt yt)))))
To bring more Haskell syntax into the mix, we want to be able to substitute [] for () and (x:xt) for (x . xt). The colon and the brackets have special meanings in Arc, but our macro can intercept them without too much difficulty.
(def union (< xs ys)
(hcase `(,xs ,ys)
xs [] = xs
[] ys = ys
(x:xt) (y:yt) =
(if (< x y) (cons x (union < xt ys))
(< y x) (cons y (union < xs yt))
(cons x (union < xt yt)))))
Next, we can replace that pesky if statement with guard clauses.
(def union (< xs ys)
(hcase `(,xs ,ys)
xs [] = xs
[] ys = ys
(x:xt) (y:yt) / (< x y) = (cons x (union < xt ys))
/ (< y x) = (cons y (union < xs yt))
/ otherwise = (cons x (union < xt yt))))
The a single pipe | character can't be used here for technical reasons, so we'll have to make due with a slash /. A double pipe || could be used just as easily.

Since the guard conditions are enclosed on either side by / and =, we can make the outer parens optional. The xs@(x:xt) constructs known as "as-patterns" are the next addition. Having these as-patterns, we can define a simple macro to bring hcase up to the level of function arguments.
(mac hdef (name . body)
(let args (uniq)
`(def ,name ,args (hcase ,args ,@body))))

(hdef union
_ xs [] = xs
_ [] ys = ys
< xs@(x:xt) ys@(y:yt) / < x y = (cons x (union < xt ys))
/ < y x = (cons y (union < xs yt))
/ otherwise = (cons x (union < xt yt)))
The result: Lo! In a dazzling display of engineering hubris, a Lisp dialect is dragged kicking and screaming into Haskell-like syntax. If you dare, you can take look at the definition of hcase in all its convoluted glory.

Implementation note: Like most Lisps, Arc would read '(x : xs) as a list of three symbols, but '(x:xs) has only one. In order to behave properly, hcase must split apart the symbols in patterns: '(xs@(x:y:yt)) becomes '(xs @ (x : y : yt)) and so on.