How do I check dataframe column value in list of strings?

Question

I have a dataframe df and want to create a new dataframe df_b from it but only taking the rows where the value of the row's column df['id'] is in my list array list_of_ids.

Both df['id'] and list_of_ids contain string values.

I thought of using a regex, but the regex would be huge since the length of list_of_ids is > 20 elements, so would need a generator over list_of_ids but I don't know how to apply that.

I was thinking something like:

list_of_ids = ["thing1", "thing2", "thing3" ]
df_b = df[df["id"].apply(lambda x: x in list_of_ids)==True]

Or I could use the .str.contains() method but pass a string that is built from all the elements of list_of_ids where they are separated by a pipe '|', but doing that doesn't seem "clean".

perl · Accepted Answer

Generating a sample DataFrame:

n = 50
df = pd.DataFrame({
    'id': list(string.ascii_letters[:n]),
    'n': range(n)})
df.head()

Out:
    id  n
0   a   0
1   b   1
2   c   2
3   d   3
4   e   4

Selecting values with ID matching values from the ids list:

ids = ['a', 'd', 'x', 'A']
df[df['id'].isin(ids)]

Out:
    id  n
0   a   0
3   d   3
23  x   23
26  A   26

How do I check dataframe column value in list of strings?

Answers (1)

Related Questions