Reputation: 751
I have a dataframe df
and want to create a new dataframe df_b
from it but only taking the rows where the value of the row's column df['id']
is in my list array list_of_ids
.
Both df['id']
and list_of_ids
contain string values.
I thought of using a regex, but the regex would be huge since the length of list_of_ids
is > 20 elements, so would need a generator over list_of_ids
but I don't know how to apply that.
I was thinking something like:
list_of_ids = ["thing1", "thing2", "thing3" ]
df_b = df[df["id"].apply(lambda x: x in list_of_ids)==True]
Or I could use the .str.contains()
method but pass a string that is built from all the elements of list_of_ids
where they are separated by a pipe '|', but doing that doesn't seem "clean".
Upvotes: 0
Views: 65
Reputation: 9941
Generating a sample DataFrame:
n = 50
df = pd.DataFrame({
'id': list(string.ascii_letters[:n]),
'n': range(n)})
df.head()
Out:
id n
0 a 0
1 b 1
2 c 2
3 d 3
4 e 4
Selecting values with ID matching values from the ids
list:
ids = ['a', 'd', 'x', 'A']
df[df['id'].isin(ids)]
Out:
id n
0 a 0
3 d 3
23 x 23
26 A 26
Upvotes: 1