Alexander Gladysh
Alexander Gladysh

Reputation: 41403

Compose synthetic English phrase that would contain 160 bits of recoverable information

I have 160 bits of random data.

Just for fun, I want to generate pseudo-English phrase to "store" this information in. I want to be able to recover this information from the phrase.

Note: This is not a security question, I don't care if someone else will be able to recover the information or even detect that it is there or not.

Criteria for better phrases, from most important to the least:

The current approach, suggested here:

Take three lists of 1024 nouns, verbs and adjectives each (picking most popular ones). Generate a phrase by the following pattern, reading 20 bits for each word:

Noun verb adjective verb,
Noun verb adjective verb,
Noun verb adjective verb,
Noun verb adjective verb.

Now, this seems to be a good approach, but the phrase is a bit too long and a bit too dull.

I have found a corpus of words here (Part of Speech Database).

After some ad-hoc filtering, I calculated that this corpus contains, approximately

This allows me to use up to

For noun-verb-adjective-verb pattern this gives 57 bits per "sentence" in phrase. This means that, if I'll use all words I can get from this corpus, I can generate three sentences instead of four (160 / 57 ≈ 2.8).

Noun verb adjective verb,
Noun verb adjective verb,
Noun verb adjective verb.

Still a bit too long and dull.

Any hints how can I improve it?

What I see that I can try:

...I'm not that good with English to come up with better phrase patterns. Any suggestions?

...I guess, I would need much better word corpus than I have now for that. Any hints where can I get a suitable one?

Upvotes: 12

Views: 512

Answers (1)

PleaseStand
PleaseStand

Reputation: 32082

I would consider adding adverbs to your list. Here is a pattern I came up with:

<Adverb>, the
    <adverb> <adjective>, <adverb> <adjective> <noun> and the
    <adverb> <adjective>, <adverb> <adjective> <noun>
<verb> <adverb> over the <adverb> <adjective> <noun>.

This can encode 181 bits of data. I derived this figure using lists I made a while back from WordNet data (probably a bit off because I included compound words):

  • 12650 usable nouns (13.6 bits/noun, rounded down)
  • 5247 usable adjectives (12.3 bits/adjective)
  • 5009 usable verbs (12.2 bits/verb)
  • 1512 usable adverbs (10.5 bits/adverb)

Example sentence: "Soaking, the habitually goofy, socially speculative swatch and the fearlessly cataclysmic, somewhere reciprocal macrocosm foreclose angelically over the unavoidably intermittent comforter."

Upvotes: 5

Related Questions