AsciiMathML

Wednesday, July 08, 2009

Can your programming language do this?

In information theory, entropy is the measurement of the amount of randomness in a probability distribution.

It's defined as



In my code, it's

(defn entropy [X]
(* -1 (Σ [i X] (* (p i) (log (p i))))))


Compare the similarity of my code above with the actual definition. Having a language that lets you declare constructs that look very similar to their mathematical definition is a huge win for readability.

The two important features I used in this example are

1) Clojure allows identifiers to be unicode
2) Macros

My macro is defined as

(defmacro Σ
"computes the sum of a series."
[[var-name values] & body]
`(reduce +
(map (fn [~var-name]
~@body) ~values)))


Of course, I have a similar one:

(defmacro Π
"computes the product of a series."

You can imagine how it looks.


The other big win here is I've abstracted out *how* the sum and product of a series are implemented. This allows me to do performance optimizations behind the scenes. For example, I could replace that call to map with a call to pmap.

The point of this isn't to show off how Clojure is better than other languages; I'm trying to show that these features are powerful. If your language of choice doesn't have these features, maybe it should.

Of course, macros are just a means to an end. I find them great, but that doesn't mean they're the One True Way to accomplish our goal of readable code. I'd love to see examples of solving this problem in other ways, in any language.

11 comments:

Anonymous said...

Hei, first of all, nice post, I'm a Clojure noob and i didn't know about macros before reading this. But i don't know, maybe it's just me,I won't say that your snippet is very readable.. those having a Lisp background probably would find this to be nonsense but to me this is miles away from the clarity you can achieve in languages like Ruby or Phyton. But as i said maybe it's just about where you come from :-)

Unknown said...

Python:

-sum(p(x) * log(p(x)) for x in Xs)

Extremely readable, viewable with any font, only uses basic language features.

Anonymous said...

and here's a ruby version, sorry i didn't post that before :

def p(x)
#dummy
return x * 0.2
end

def log(x)
return Math.log(x)
end

class Array
def sum
inject(nil) { |sum,value| sum ? sum+yield(value) : yield(value) }
end
end

x = [1,2,3,4,5] #array

-x.sum { |xi| p(xi)*log(p(xi)) }

Couldn't use just standars features, added some methods to make the formula clearer.
Overriding p at the script level is no good though (p prints to the standard output in ruby)

Unknown said...

I'm not convinced these need to be macros. What's the rationale behind that choice?

As functions:

(defn Σ [f coll]
(reduce + (map f coll)))

(defn entropy [nums]
(Σ #(* (p %) (log (p %))) nums))

I'd argue this is more elegant, as it's closer in spirit to the way mathematical functions are used (first-class, higher order).

Also, you can then pass Σ itself as an argument without resorting to lambda-lifting. Imagine using (comp (partial Σ g) h) or something.

Allen Rohner said...

Abhishek: As I said on HN, I didn't like the extra noise that would come from needing to use an anonymous function to pass to Σ. Given the feedback both here and on HN, maybe I should replace the macros with functions.

Unknown said...

Now all we need is a text editor that can take the code and automatically generate the mathematical equivalent and display it. Unicode is great, but it's about time our coding tools started understanding the code.

Mark Watson said...

I don't think it's that great. As other commenters pointed out you can use functions instead, and who really cares about pretty summation signs. We're engineers, we don't need pretty signs we need solid, efficient code. Using unicode for our function names just makes stuff more brittle and a pain in the butt. I mean how do you even type a summation sign?

Unknown said...

@Mark:

Whilst software engineers may not value it, some end users might -- such as scientists using an embedded domain-specific language for scientific programming.

Input is generally impractical at present. But it's still nice to know that programming systems are prepared for the day when it becomes easier.

Or perhaps instead of Sigma, you could make a case for accented Latin, or Cyrillic, or CJK...

Iain Buckingham said...

I've always felt that giving variables meaningful readable names is one of the most important things a programmer can do, since a piece of code is read more often that it is written. Equally important is separating out pieces of code and naming them, allowing work at different levels of abstraction.

Mathematical formulas frequently don't follow this principle, probably because brevity is a great advantage when writing by hand on a chalkboard or paper. What may be useful syntax when deriving a formula might not be so suitable for use in working code. The functional community in particular seem to feel that the more closely their code resembles mathematical formula the better. Instead they should think more about the convenience of readers and maintainers and not restrict comprehension because of conventions from the paper age. Jamming so much functionality into a single line is obfuscation not elegance.

Unknown said...

APL:

-+/p×p⍟i

Just tested, works fine. No macros, no function definition, no nothing. Just define arrays p and i, and use the statement above in the interpreter.

Unknown said...

APL:

If you want, you can leave out i and just do it in terms of p:

-+/p×p⍟⍳⍴p