To get an approach to statistics without the untenable assumption that the observed data have been generated by a 'true' distribution, a yardstick of the performance of a model P as a distribution is taken as the probability or density P(x) it assigns to the data. Equivalently, it can be taken as log1/P(x) , which has the interpretation of a code length as the number of bits when x is encoded as a binary string. Of two models P and Q , according to the Maximum Likelihood principle, the former is better if P(x)>Q(x) or log1/P(x)