Rickie Marsh
Rickie Marsh

Reputation: 53

Really big permutation list

My question is not language specific. I'm having issues with getting the loop to process permutations. I'm attempting to code something to display all values for 26^x where x is the length of a string. No input string will be supplied so if x=1, it'll display a through z, if x=2 itll display aa through zz. az is seen as different from za.

More specifically, I'm wanting to run this for longer strings, 100+ characters in length in an attempt to see how many strings of a given length containing words as opposed to random letters.

Upvotes: 2

Views: 210

Answers (3)

ElKamina
ElKamina

Reputation: 7817

It would depend on your definition of 'word'. If 'a' is a word it is very easy to get a lower limit on probability of getting a word in a 100 character sequence (roughly 1- 1/e^4). Similarly you can consider 2 letter words and 3 letter words and refine the probability. After 4 or 5 letters this probability becomes really accurate because there are few longer words and them randomly occurring is very rare.

Upvotes: 0

mfsiega
mfsiega

Reputation: 2872

Per the comment on the question, it's somewhat impractical to try to enumerate all of the possible 100-character strings.

I would suggest the alternate strategy of generating random strings of the given length, rather than enumerating in a structured manner. Something like:

count = 0
for i from 0 to simulation_length:
    random_string = ''
    for j from 0 to string_length:
        random_string += random_char()
    // containsWord(string) checks if the random string contains a word
    // this is tricky in and of itself
    if (containsWord(random_string)) count++
...

The random sampling will give you a representation of the behavior across the whole space, as long as simulation_length is sufficient.

Upvotes: 1

SigTerm
SigTerm

Reputation: 26439

26^x where x is the length of a string ... I'm wanting to run this for longer strings, 100+ characters in length

You should forget about it.

Let's put things in perspective. There are 26 letters in english alphabet, so total number of strings with 100 characters in them is ...

3142930641582938830174357788501626427282669988762475256374173175398995908420104023465432599069702289330964075081611719197835869803511992549376

That's decimal number. At the speed of 1 string per millisecond it'll take 9.9*10^130 years to print them all. That's 7.3*10^120 times longer than universe has existed.

Get a word list or load dictionary into memory and use it instead.

Upvotes: 1

Related Questions