Wednesday, July 08, 2009

Can your programming language do this?

In information theory, entropy is the measurement of the amount of randomness in a probability distribution.

It's defined as

In my code, it's

(defn entropy [X]
(* -1 (Σ [i X] (* (p i) (log (p i))))))

Compare the similarity of my code above with the actual definition. Having a language that lets you declare constructs that look very similar to their mathematical definition is a huge win for readability.

The two important features I used in this example are

1) Clojure allows identifiers to be unicode
2) Macros

My macro is defined as

(defmacro Σ
"computes the sum of a series."
[[var-name values] & body]
`(reduce +
(map (fn [~var-name]
~@body) ~values)))

Of course, I have a similar one:

(defmacro Π
"computes the product of a series."

You can imagine how it looks.

The other big win here is I've abstracted out *how* the sum and product of a series are implemented. This allows me to do performance optimizations behind the scenes. For example, I could replace that call to map with a call to pmap.

The point of this isn't to show off how Clojure is better than other languages; I'm trying to show that these features are powerful. If your language of choice doesn't have these features, maybe it should.

Of course, macros are just a means to an end. I find them great, but that doesn't mean they're the One True Way to accomplish our goal of readable code. I'd love to see examples of solving this problem in other ways, in any language.


andreadallera said...

Hei, first of all, nice post, I'm a Clojure noob and i didn't know about macros before reading this. But i don't know, maybe it's just me,I won't say that your snippet is very readable.. those having a Lisp background probably would find this to be nonsense but to me this is miles away from the clarity you can achieve in languages like Ruby or Phyton. But as i said maybe it's just about where you come from :-)

Johannes said...


-sum(p(x) * log(p(x)) for x in Xs)

Extremely readable, viewable with any font, only uses basic language features.

andreadallera said...

and here's a ruby version, sorry i didn't post that before :

def p(x)
return x * 0.2

def log(x)
return Math.log(x)

class Array
def sum
inject(nil) { |sum,value| sum ? sum+yield(value) : yield(value) }

x = [1,2,3,4,5] #array

-x.sum { |xi| p(xi)*log(p(xi)) }

Couldn't use just standars features, added some methods to make the formula clearer.
Overriding p at the script level is no good though (p prints to the standard output in ruby)

Abhishek said...

I'm not convinced these need to be macros. What's the rationale behind that choice?

As functions:

(defn Σ [f coll]
(reduce + (map f coll)))

(defn entropy [nums]
(Σ #(* (p %) (log (p %))) nums))

I'd argue this is more elegant, as it's closer in spirit to the way mathematical functions are used (first-class, higher order).

Also, you can then pass Σ itself as an argument without resorting to lambda-lifting. Imagine using (comp (partial Σ g) h) or something.

Allen Rohner said...

Abhishek: As I said on HN, I didn't like the extra noise that would come from needing to use an anonymous function to pass to Σ. Given the feedback both here and on HN, maybe I should replace the macros with functions.

Shrutarshi said...

Now all we need is a text editor that can take the code and automatically generate the mathematical equivalent and display it. Unicode is great, but it's about time our coding tools started understanding the code.

Mark Watson said...

I don't think it's that great. As other commenters pointed out you can use functions instead, and who really cares about pretty summation signs. We're engineers, we don't need pretty signs we need solid, efficient code. Using unicode for our function names just makes stuff more brittle and a pain in the butt. I mean how do you even type a summation sign?

Abhishek said...


Whilst software engineers may not value it, some end users might -- such as scientists using an embedded domain-specific language for scientific programming.

Input is generally impractical at present. But it's still nice to know that programming systems are prepared for the day when it becomes easier.

Or perhaps instead of Sigma, you could make a case for accented Latin, or Cyrillic, or CJK...

Iain Buckingham said...

I've always felt that giving variables meaningful readable names is one of the most important things a programmer can do, since a piece of code is read more often that it is written. Equally important is separating out pieces of code and naming them, allowing work at different levels of abstraction.

Mathematical formulas frequently don't follow this principle, probably because brevity is a great advantage when writing by hand on a chalkboard or paper. What may be useful syntax when deriving a formula might not be so suitable for use in working code. The functional community in particular seem to feel that the more closely their code resembles mathematical formula the better. Instead they should think more about the convenience of readers and maintainers and not restrict comprehension because of conventions from the paper age. Jamming so much functionality into a single line is obfuscation not elegance.

pineapple said...



Just tested, works fine. No macros, no function definition, no nothing. Just define arrays p and i, and use the statement above in the interpreter.

pineapple said...


If you want, you can leave out i and just do it in terms of p: