JW_Swartz
JW_Swartz

Reputation: 55

Matching item in list with item in column containing list

I have a data feed where one column contains lists. I also have a list of values I want to detect. I want to create a new column that has 1 when one or more of the column values is in the list of values, and 0 when none of the items in the column list are in the detection list.

# df with list
       job               type
0        a              [300]
1        b         [234, 177]
2        c               [-1]

# Detection List
impacted = pd.Series([174, 175, 176, 177])

# Desired df
       job               type     risky
0        a              [300].        0
1        b         [234, 177].        1
2        c               [-1].        0

I've tried numerous techniques, but can't find the correct pattern to compare the column list to the detection list:

df.loc[df['type'].map(set).isin(impacted.str).any(1), 'risky'] = 1

df.loc[df['type'].isin(impacted), 'risky'] = 1

if set(impacted).intersection(df['type']):
    df.risky = 1

These all fail for various reasons. Any suggestions?

Upvotes: 0

Views: 92

Answers (4)

JW_Swartz
JW_Swartz

Reputation: 55

w00t! @QuangHoang led me in the right direction. Just needed to add this before the .explode line:

impacted = list(map(str, impacted))

The logic needed strings, not int. Now it works perfectly!

Upvotes: 0

r.ook
r.ook

Reputation: 13858

Since you already have your match list in a Series, you can make use of the isin and sum method:

df['risky'] = df['type'].apply(impacted.isin).sum(axis=1)

Which gives you:

  job        type  risky
0   a       [300]      0
1   b  [234, 177]      1
2   c        [-1]      0

The useful part of this is sum will also give you the number of matches, given:

  job             type
0   a            [300]
1   b       [234, 177]
2   c             [-1]
3   d  [174, 175, 176]

The result will be:

  job             type  risky
0   a            [300]      0
1   b       [234, 177]      1
2   c             [-1]      0
3   d  [174, 175, 176]      3

If all you care about is a 0 or 1 for True/False, use any:

df.assign(risky=df['type'].apply(impacted.isin).any(axis=1).astype(int))

  job             type  risky
0   a            [300]      0
1   b       [234, 177]      1
2   c             [-1]      0
3   d  [174, 175, 176]      1

Upvotes: 0

Haleemur Ali
Haleemur Ali

Reputation: 28233

a straight-forward but rather slow (performance wise) way to do this would be to convert both the list & the series to set and check if the intersection is non-empty

df['risky'] = df.type.apply(lambda x: 1 if set(x).intersection(set(impacted)) else 0)

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

If you work with pandas 0.25+ you can try explode:

df['risky'] = (df['type'].explode()
                  .isin(impacted).any(level=0)
                  .astype(int)
              )

Output:

  job        type  risky
0   a       [300]      0
1   b  [234, 177]      1
2   c        [-1]      0

Upvotes: 3

Related Questions