There's an piece floating around that does a great, succinct job at summarizing Claude Shannon's contributions to our modern understanding of information. If you haven't read The bit bomb on Aeon, head over there. It'll make your brain happy with things like this:
"Shannon – mathematician, American, jazz fanatic, juggling enthusiast – is the founder of information theory, and the architect of our digital world. It was Shannon’s paper ‘A Mathematical Theory of Communication’ (1948) that introduced the bit, an objective measure of how much information a message contains."
The article digs deep into how easy it is to predict things - especially language. It ends up focusing on the power of pattern detection in being able to compress information:
"Shannon expanded this point by turning to a pulpy Raymond Chandler detective story […] He flipped to a random passage … then read out letter by letter to his wife, Betty. Her role was to guess each subsequent letter […] Betty’s job grew progressively easier as context accumulated […] a phrase beginning ‘a small oblong reading lamp on the’ is very likely to be followed by one of two letters: D, or Betty’s first guess, T (presumably for ‘table’). In a zero-redundancy language using our alphabet, Betty would have had only a 1-in-26 chance of guessing correctly; in our language, by contrast, her odds were closer to 1-in-2. "