Saturday, March 29, 2008

Returning '(()) Considered Difficult

Suppose you want a function to find all subsequences of a list with a certain length. For instance, the subsequences of (1...5) with length 2 are as follows.
(1 2) (1 3) (1 4) (1 5)
(2 3) (2 4) (2 5)
(3 4) (3 5)
(4 5)
This is not a red-herring example cooked up just to make Java look bad. This actually came up at work! I did something along these lines. Yuck.
private static <T> List<List<T>> subn(int n, List<T> li) {
List<List<T>> ret = new ArrayList<List<T>>();
if (n == 0) {
ret.add(new ArrayList<T>());
return ret;
}
if (li.isEmpty()) {
return ret;
}
T x = li.get(0);
List<T> xs = li.subList(1, li.size());
for (List<T> sub : subn(n-1, xs)) {
sub.add(0, x);
ret.add(sub);
}
ret.addAll(subn(n, xs));
return ret;
}
However, that was not the way my thoughts originally took shape. I think in Lisp.
(def subn (n li)
(if (is 0 n) '(())
(no li) '()
(join (map [cons (car li) _] (subn (- n 1) (cdr li)))
(subn n (cdr li)))))
Then I moved it to Haskell for greater clarity.
subn 0 _      = [[]]
subn _ [] = []
subn n (x:xs) = map (x:) (subn (n-1) xs) ++ subn n xs
With great pain, I converted those three lines of Haskell into the ugly splatter of Java you saw above. Later on, I started playing around with Scala and gradually compacted the Java into something strikingly similar to its Haskell ancestor.
def subn[T](n: int, li : List[T]) : List[List[T]] = (n,li) match {
case (0,_) => List(Nil)
case (_,Nil) => Nil
case (_,x::xs) => (subn(n-1, xs) map (x :: _)) ::: subn(n, xs)
}
The title of today's post refers to the inordinate amount of trouble it is to simply return "The list containing the empty list" in Java. I tried such monstrosities as these:
Collections.singletonList(Collections.EMPTY_LIST) // List<List>
Arrays.asList(Collections.emptyList()) // List<List<Object>>
Collections.singletonList(new ArrayList<T>()) // List<ArrayList<T>>
Collections.singletonList((List<T>)new ArrayList<T>()) // List<List<T>>
The last option does give the correct type, but it causes a compiler warning and is not much less cumbersome than the naive approach, i.e.:
List<List<T>> ret = new ArrayList<List<T>>();
ret.add(new ArrayList<T>());
return ret;
So here's a challenge for you Java developers out there. What would you change about the admittedly sloppy Java code at the beginning of this post? I'm issuing an all points bulletin to the refactoring mavericks (Mike, James), the simple-design visionaries (Steve), and the speed-freak optimizers (Dan). I wouldn't even mind seeing a C# version (Ethan), or an improvement on the Haskell (Eric).

Incidentally, I know some people think recursion is hard to read. I'd love to see an iterative solution as well.

P.S. I realize that the comment boxes don't hold code blocks very well. Feel free to link to a solution.


Updates:
Based on your input, I am now happy to give three different ways of returning '(()) in one line. I am only including ways that would work in the context I gave (i.e. the inner list must be mutable). The last is my favorite because the others cause compiler warnings.
return Arrays.<List<T>>asList(new ArrayList<T>());
// Type safety : A generic array of ArrayList is created for a varargs parameter

return new ArrayList<List<T>>() {{add(new ArrayList<T>());}};
// The serializable class does not declare a static final serialVersionUID...
// Ricky points out that this is an "Eclipse warning".

return Collections.<List<T>>singletonList(new ArrayList<T>());
// No warnings :)
Eric has blessed us with a version in Factor (not to be confused with my aversion to Factor). He also added a Haskell version using list comprehensions to achieve better laziness properties.

Meanwhile, back in Javaland, Dan suggested using SortedSet, particularly the subSet method. This seems to be a dead end in this case, but is certainly a useful corner of the Collection hierarchy.

After briefly attaining List Enlightenment, Thomas produced an implementation in Java with no recursion. Aside from being much longer and totally opaque, it also has worse performance charactoristics. It essentially produces every subsequence and then filters by length, similar to the Haskell one-liner suggested in the comments.

25 comments:

Ethan Vizitei said...

hmm...that IS a little frustrating. If I take that your primary problem is how ugly it is to just return a list containing an empty list, then we need to add some sugar to make that more pleasent. Obviously it's never going to look as good as '(()), but perhaps you can do something similar to your Collections.singletonList example, just with less words. If this is a problem that is going to come up frequently, I would mind seeing a signature like this:

Collections.EmptyNester<T>();

Both amusing and a little more succinct. You'll still be using the original "naive" method under the covers, but your loop would be a little less ugly. Like I said, though, not as good as '(()).

Maybe another option is to hack the byte-code compiler you're using and translate any occurance of '<T>(()) into the necessary function. ; )

Ray Myers said...

Ethan: Actually I dislike the entire 18-line method. The last 7 lines are really just a map and an append. I imagine C# 3's closures might help here.

Returning '(()) is a particularly icky part though. If I were going to use a different API, I would probably make List.add(...) return this, similar to StringBuilder. Then it becomes:

return new ArrayList<List<T>>()
.add(new ArrayList<T>());

That would also allow more general List construction: .add(1).add(2).add(3) and so forth.

James Iry said...

One of the things that often gets said when language wars heat up is that Java is easy to read. That's often pretty true in the small, but in the large (or even the medium) it's hard to defend. Even though I have written far, far more Java than any of the other languages (by several orders of magnitude) I still found the readability of your samples highest in Haskell, then Scala, then Lisp. Java was so far out of the running it didn't even compete - I really had to puzzle at it to see the essential algorithm buried in all the boilerplate.

Also, for grins consider this powerset function. If you do that and then filter for length you get the desired result (albeit inefficiently) in one line of code. Admittedly, the readability of that solution would be a bit debatable ;-)

Michael Finney said...

Interesting example, Ray. Java is verbose at times. Many would suggest taking that function and breaking it down into other functions (extract method refactoring).

Ray Myers said...

Ok Mike, I'll bite. Which piece of this should be its own method? What should be the name? Surely not "subnHelper"...

Michael Finney said...

I take the thing that hurts the most and try something with it.

The code around and within the for loop seems to hurt. A variable name or two seems to hurt.

I change code. I recompile. I capture the chunks into names. In other words, I name the concepts.

I have tasks I wish to jump into. This refactoring opportunity is not yet one of them. :) (Since the code is where I work... we'll see.)

Thomas Quick said...

I was unaware that we were trying to do this quickly; for various reasons. All I did was accept your challenge. :D

Ray Myers said...

Thomas: No, you successfully answered the challenge by producing the correct output with no recursion. I was just stating the caveats as they were. Be not afraid, you've still won free tickets to cadrlife.com and a copy of our home game!

Psyonic said...

Ray,

I wrote my own iterative version in Java that is significantly faster running than yours... 5-10x on average.

That said, it's definitely not easy on the eyes. I'd still go with the Haskell version, given the choice :)

I wrote a blog post about it here.

Psyonic said...

Ray,

Another comment about your solution... it seems to slowdown significantly as the size of the list grows.
I composed a list of [1..2000], and made all subsequences of size 2.

Here are my results:
Recursive version count: 1999000
Recursive version time: 89061 (ms)
Iterative version count: 1999000
Iterative version time: 710 (ms)

For this case, I'm getting a 125x speedup.

I don't have time to analyze this too deeply right now, but it'd be interesting to figure out what's causing such a dramatic slowdown in your solution.

Reinier Zwitserloot said...

Actually, java can do this just fine:

return Collections.<List<List<T>>>singletonList(Collections.<List<T>>emptyList());

The second generics parameter (List<T>) is either optional, or if it isn't, should have been optional. Fortunately there are some waves in java7 to ensure such stuff is definitely optional. In fact, if the statement is either an assignment or a return (which is a bit like an assignment), there are waves to aggressively infer the type. There is certainly no clear reason why this should be impossible.

In other words, I wouldn't be surprised if this works just fine in java7:

public <T> List<List<T>> returnSomething(T whatever) {
return Collections.singletonList(Collections.emptyList());
}

without warnings or errors.

Brandon Bloom said...

Ignore my previous post, it's obvious that it is past 5 am...

Here is an empty-list-free version in C# 3:

http://pastebin.com/f36788370

Brandon Bloom said...

Sorry for the multiple posts, but this is what a C# 3 solution looks like when you aren't smoking crack:

http://pastebin.com/f12176ec3

Brandon Bloom said...

OK OK I Promise! Last one. I'm sorry for spamming your blog.

http://pastebin.com/f54d396f

David R. MacIver said...

Slightly belated, but here's a nice Haskell one liner for doing it:

let subsequences n x = take (length x + 1 - n) . map (take n) . tails $ x

David R. MacIver said...

ok. It would help if I read the post properly. Ignore the one liner. :-)

Shaun Gilchrist said...

Not strictly answering your question, but what about this ugliness in php?
function subn($n){
$sl = Array();
for($x=1; $x<$n; $x++)
for($y=$x; $y<$n; $y++)
$sl[$x][] = Array($x,$y+1);
return $sl;
}

Psyonic said...

Shaun,

that works for subsequences of size 2, but his original function is generalized to work for any size subsequence. See my post above for an iterative approach in Java.

Ray Myers said...

Reinier is right on, though I think he meant:

return Collections.<List<T>>singletonList(Collections.<T>emptyList());

He also points out that there are efforts to improve the type inference :)

Incidently, emptyList() would not work in this context because subn would attempt to add items to the immutable list it returns.

This would work fine though:

return Collections.<List<T>>singletonList(new ArrayList<T>());

Christian Vest said...

Dirty Harry would do it like this:

http://pastebin.com/m12f57191

Chaddaï said...

Actually Psyonic, I don't know why the recursive solution performances fall so drastically in Java, but the Haskell function (at least Eric's version) seems to have performances on par with your iterative version.

So it's probably a Java thing, maybe the number of small object created (we should check if there's less in your solution.

--
Jedaï

Daniel Yokomizo said...

Either

new ArrayList<List<T>>() {{add(new ArrayList<T>());}};
and


Collections.singletonList(Collections.<T>emptyList());

should work. Another solution would be:

Arrays.asList(Arrays.<T>asList())

The different solutions give you nine possible combinations of mutable/mutable-with-fixed-size/immutable for both the internal and the external list (i.e. mutable = ArrayList, mutable with fixed size = Arrays.asList, immutable = Collections.(emptyList|singletonList)).
Not as nice as [[]] though, but if we try to write down all the possibilities in Haskell the other ones will be less beautiful than the immutable singleton list holding an immutable empty list. The main problem of Java (WRT this problem) is the lack of type inference, lack of suitable literal syntax for building collections and awfully named methods in libraries (due mostly to trying to make everything be a method of a class and lack of proper modules). Fixing those flaws we could even use something like: list(list()), which wouldn't be too bad.

Daniel Yokomizo said...

On a second thought if Java placed generic parameters in method calls between the name and the parenthesis (like it does with constructor calls and like C# does) we could write:

import static java.util.Arrays.asList;
...
return asList(asList<T>());

Which isn't too bad. Creating a static factory method list that calls builds an ArrayList given an varargs array of objects we could even get: list(list<T>()). While I agree that Java has many semantic problems, this particular problem is just a matter of syntax and library names.

Glen said...

Okay. Here's a groovy version. Just basically took the Java version pasted it in and reduced it one step at a time until I got something reasonable.

http://pastebin.com/f6eeff5ad

It would have been shorter if groovy had a nice built in method to let you get the cdr of a list.

Ricky Clarkson said...

From the update you gave, with []s instead of chevrons:

return new ArrayList[List[T]]() {{add(new ArrayList[T]());}};
// The serializable class does not declare a static final serialVersionUID...

That warning is from Eclipse, not from Java. Turn it off, it's useless (the warning, Eclipse, take your pick).