Arbitrary inexact classifier Suppose you have a classification system that does better than randomly guessing the classification attribute according to its distribution. Let C be the classification attribute and R the prediction. Then the conditional entropy of C given R is less than the entropy of C CEnt(C|R) < Ent(C). Therefore, using R you can find an encoding of C that is shorter than the optimal encoding without using R. So if you encode the C column using the description of the classifier plus the optimal encoding of C given R, you will be shorter than the optimal encoding of C without R.