MDL can provide an answer. If the language does obey some non-trivial grammar, then that grammar can be used to derive a more compact representation of a corpus of sentences. For instance, consider a context-free grammar where all the rules have been written in the form NON-TERMINAL -> SYMBOL SYMBOL ... (Chomsky normal form). We can develop an encoding as follows: For each nonterminal, number the productions from that non-terminal. Then, doing a depth-first search of the parse tree, give the number of the production used at each node. The result is generally an encoding that is a little shorter than the original sentence, since the number of choices at each step is less than the number of words in the language.