Reputation: 55
I have a data feed where one column contains lists. I also have a list of values I want to detect. I want to create a new column that has 1 when one or more of the column values is in the list of values, and 0 when none of the items in the column list are in the detection list.
# df with list
job type
0 a [300]
1 b [234, 177]
2 c [-1]
# Detection List
impacted = pd.Series([174, 175, 176, 177])
# Desired df
job type risky
0 a [300]. 0
1 b [234, 177]. 1
2 c [-1]. 0
I've tried numerous techniques, but can't find the correct pattern to compare the column list to the detection list:
df.loc[df['type'].map(set).isin(impacted.str).any(1), 'risky'] = 1
df.loc[df['type'].isin(impacted), 'risky'] = 1
if set(impacted).intersection(df['type']):
df.risky = 1
These all fail for various reasons. Any suggestions?
Upvotes: 0
Views: 92
Reputation: 55
w00t! @QuangHoang led me in the right direction. Just needed to add this before the .explode line:
impacted = list(map(str, impacted))
The logic needed strings, not int. Now it works perfectly!
Upvotes: 0
Reputation: 13858
Since you already have your match list in a Series, you can make use of the isin
and sum
method:
df['risky'] = df['type'].apply(impacted.isin).sum(axis=1)
Which gives you:
job type risky
0 a [300] 0
1 b [234, 177] 1
2 c [-1] 0
The useful part of this is sum
will also give you the number of matches, given:
job type
0 a [300]
1 b [234, 177]
2 c [-1]
3 d [174, 175, 176]
The result will be:
job type risky
0 a [300] 0
1 b [234, 177] 1
2 c [-1] 0
3 d [174, 175, 176] 3
If all you care about is a 0
or 1
for True/False, use any
:
df.assign(risky=df['type'].apply(impacted.isin).any(axis=1).astype(int))
job type risky
0 a [300] 0
1 b [234, 177] 1
2 c [-1] 0
3 d [174, 175, 176] 1
Upvotes: 0
Reputation: 28233
a straight-forward but rather slow (performance wise) way to do this would be to convert both the list & the series to set and check if the intersection is non-empty
df['risky'] = df.type.apply(lambda x: 1 if set(x).intersection(set(impacted)) else 0)
Upvotes: 0
Reputation: 150735
If you work with pandas 0.25+ you can try explode
:
df['risky'] = (df['type'].explode()
.isin(impacted).any(level=0)
.astype(int)
)
Output:
job type risky
0 a [300] 0
1 b [234, 177] 1
2 c [-1] 0
Upvotes: 3