Reputation: 6802
I have a DataFrame with random 5 letter "words". I'd like to filter them using some criteria from the game Wordle.
For example, find all words which satisfy the following:
That would correspond to guessing 'abcde' and getting the response: a=green, b=black, c=green, d=yellow, e=black.
I got it working using a MultiIndex and building a column for the presence of each letter, which feels rather inefficient. Is there a better approach?
import random
import string
import pandas as pd
rand_words = [''.join(random.choice(string.ascii_lowercase) for _ in range(5)) for _ in range(20000)]
tuples = [list(word) for word in rand_words]
index = pd.MultiIndex.from_tuples(tuples, names=["L0", "L1", "L2", "L3", "L4"])
df = pd.DataFrame({"word":rand_words}, index=index)
for ch in string.ascii_lowercase:
df[ch] = df['word'].map(lambda word: ch in word)
# filter for 'a' and 'c' in positions 0 and 2
# then query for rows that don't contain 'b' or 'e', but do contain 'd'
print(df.xs(('a','c'), level=(0,2), drop_level=False).query('~b & d & ~e')['word'])
Output:
L0 L1 L2 L3 L4
a d c j q adcjq
h c d n ahcdn
c c d k accdk
s c z d asczd
Upvotes: 1
Views: 356
Reputation: 29635
Here is a way using the str
accesor, sometimes with contains
or the inverse ~
, sometimes with position []
to get one letter and equal (eq
) or not (ne
). So in your case, you can do
random.seed(1) # for reproductibility
rand_words = [''.join(random.choice(string.ascii_lowercase)
for _ in range(5)) for _ in range(20000)]
df = pd.DataFrame({"word":rand_words})
print(
df.loc[
df['word'].str[0].eq('a')
& ~df['word'].str.contains('b')
& df['word'].str[2].eq('c')
& df['word'].str.contains('d') & df['word'].str[3].ne('d')
& ~df['word'].str.contains('e')
]
)
# word
# 8902 agcsd
# 14816 adcyr
Note the part & df['word'].str[3].ne('d')
that ensure that d is not at this position while existing in the word df['word'].str.contains('d')
as I understand yellow would means.
Upvotes: 0