I obviously started thinking, are maybe all words connected in a large graph of synonyms? Or are words clustered into several groups with more or less similar meanings? And, if so, how many of such groups are there? A few tens? some hundreds? If you want to try to guess, this is your last chance.
Being the maniac I am, and having some basic knowledge of graph theory, I got my hands on the digital version of a thesaurus, and hacked in some quick scripts to find the answers. These are the results.
I used The Oxford American Writer's Thesaurus, it is also the one used by the Dictionary application available on Mac's. The version of the thesaurus I used has 31'673 ‘senses’ (i.e. entries relating a word with many other of the same meaning), and a total of 52'307 different words and small phrases.
Apart from a small set of very disconnected 193 senses (more details later), the rest of the 31'480 word senses are all linked together in the same group of connected words. This means that, pretty much from every word in the thesaurus you can get to every other word just by following chains of synonyms.
Even words with opposite meanings such as ‘good’ and ‘bad’ are connected, and not very far apart. Just by looking into the entries of two senses one finds that ‘good’ is listed as synonym of ‘mean’ (in the sense of accomplished), while ‘mean’ is listed as synonym of ‘bad’ (in the sense of base). If you are skeptic, here are the two relevant entries taken directly from the thesaurus:
accomplished: an accomplished bassoonist expert, skilled, skillful, masterly, successful, virtuoso, master, consummate, complete, proficient, talented, gifted, adept, adroit, deft, dexterous, able, good, competent, capable, efficient, experienced, seasoned, trained, practiced, professional, polished, ready, apt; informal great, mean, nifty, crack, ace, wizard; informal crackerjack.
base2: base motives sordid, ignoble, low, low-minded, mean, immoral, improper, unseemly, unscrupulous, unprincipled, dishonest, dishonorable, shameful, bad, wrong, evil, wicked, iniquitous, sinful. antonym noble.
Some more interesting trivia facts: In this large connected group, every word is connected to every other word by following, in average, 3.81 senses. So paths between different words tend to be very short. The center of the thesaurus is the word
from which you can get to any other connected word in an average of 2.66 senses.
The most distant pair of words is only 9 senses apart. These are the short phrases ‘swimming trunks’ and ‘in any other way’ which are connected by the following chain of senses:
bathing suit: swimming trunks - bathing suit
swimsuit: bathing suit - trunks
luggage: trunks - luggage
possession: luggage - assets
saving: assets - saving
but: saving - but
but: but - on the other hand
alternatively: on the other hand - otherwise
otherwise: otherwise - in any other way
The ‘boring’ words, which are disconnected from everything else, are usually words which only list a few synonyms and/or some usage notes in the thesaurus. The biggest group of such words has only 5 related senses. For those with curiosity, this is the full list of disconnected senses.
Also for those interested, the kind of programming techniques that I used to compute this information are very similar to those used by Stephen Dolan on his Six Degrees of Wikipedia. Although, being my graph considerably smaller, the results were obtained just by leaving a single computer running overnight.