Reputation: 199
I have a pandas data frame with the following format:
col1 col2 ... col4
A 2 [2-3-4]
B 3 [2-6]
A 3 [2-3-4]
C 2 [2-3-4]
D 2 [2-3-4]
I would like to select only the rows where the value in col2 is in the list of col4.
I tried to use:
df[(df["col2"].isin(df["col4"].str.split("-"))]
but I get an empty data frame...
Upvotes: 2
Views: 144
Reputation: 5918
Code
df['col4'] = df.col4.astype(str).str.replace('-',',')
df['col2'] = df.col2.astype(str)
df= df[df.apply(lambda x: x.col2 in x.col4, axis=1)]
Output
col1 col2 col4
0 A 2 [2,3,4]
2 A 3 [2,3,4]
3 C 2 [2,3,4]
4 D 2 [2,3,4]
Upvotes: 2
Reputation: 75080
I would use a list comprehension here for this usecase:
df[[str(a) in b for a,b in zip(df['col2'],df['col4'])]]
col1 col2 col4
0 A 2 [2-3-4]
2 A 3 [2-3-4]
3 C 2 [2-3-4]
4 D 2 [2-3-4]
Or using regex search which will not match 2 with 22 #thanks @Nk03
import re
df[[bool(re.search(fr'\b{a}\b',b)) for a,b in zip(df['col2'],df['col4'])]]
Upvotes: 4
Reputation: 14949
You can try this :
import ast
df.col4 = df.col4.str.replace('-',',').apply(ast.literal_eval)
new_df = df[df.apply(lambda x: x['col2'] in x['col4'], axis =1)]
Upvotes: 1