Chris Sears
Chris Sears

Reputation: 6802

Filtering Wordle words using a DataFrame

I have a DataFrame with random 5 letter "words". I'd like to filter them using some criteria from the game Wordle.

For example, find all words which satisfy the following:

That would correspond to guessing 'abcde' and getting the response: a=green, b=black, c=green, d=yellow, e=black.

I got it working using a MultiIndex and building a column for the presence of each letter, which feels rather inefficient. Is there a better approach?

import random
import string
import pandas as pd

rand_words = [''.join(random.choice(string.ascii_lowercase) for _ in range(5)) for _ in range(20000)]

tuples = [list(word) for word in rand_words]

index = pd.MultiIndex.from_tuples(tuples, names=["L0", "L1", "L2", "L3", "L4"])

df = pd.DataFrame({"word":rand_words}, index=index)

for ch in string.ascii_lowercase:
    df[ch] = df['word'].map(lambda word: ch in word)

# filter for 'a' and 'c' in positions 0 and 2
# then query for rows that don't contain 'b' or 'e', but do contain 'd'
print(df.xs(('a','c'), level=(0,2), drop_level=False).query('~b & d & ~e')['word'])

Output:

L0  L1  L2  L3  L4
a   d   c   j   q     adcjq
    h   c   d   n     ahcdn
    c   c   d   k     accdk
    s   c   z   d     asczd

Upvotes: 1

Views: 356

Answers (1)

Ben.T
Ben.T

Reputation: 29635

Here is a way using the str accesor, sometimes with contains or the inverse ~, sometimes with position [] to get one letter and equal (eq) or not (ne). So in your case, you can do

random.seed(1) # for reproductibility
rand_words = [''.join(random.choice(string.ascii_lowercase) 
              for _ in range(5)) for _ in range(20000)]
df = pd.DataFrame({"word":rand_words})

print(
    df.loc[
        df['word'].str[0].eq('a') 
        & ~df['word'].str.contains('b') 
        & df['word'].str[2].eq('c') 
        & df['word'].str.contains('d') & df['word'].str[3].ne('d') 
        & ~df['word'].str.contains('e')
    ]
)

#         word
# 8902   agcsd
# 14816  adcyr

Note the part & df['word'].str[3].ne('d') that ensure that d is not at this position while existing in the word df['word'].str.contains('d') as I understand yellow would means.

Upvotes: 0

Related Questions