Philip C.
Philip C.

Reputation: 31

python pandas data frame create new column with label (0 or 1) if row value is in series

I'm pretty new to python and pandas and I have a small problem. I tried to find a solution on the forums and google but couldn't find one. So here we go:

I have a series that contains unique names:

in [8]: Name_Series.head()
Out[8]: 
0     US2005642
1     US2007961
2       US13721
3     US2013770
4       US14822
dtype: object

In my dataframe there is a column that contains one name per row.

In [5]: df.Name.head()
Out[5]: 
0    JP2015121
1      US14822
2      US14358
3    JP2015539
4    JP2015156
Name: AppNo, dtype: object

what I need is a new column 'Label' which contains a 1 if Name is contained in Name_Series and a 0 if its not contained.

My idea was to write a function that returns 1 or 0 and apply it to the dataframe:

def Label(Name_Series, Name):
if Name_Series.str.contains(Name).sum()>0:
    return 1            
else:
    return 0
df['Prio'] = list(map(Label_Prio, PrioList, df.AppNo))

Unfortunately this leads to the following error:

IN [9]: df['Label'] = list(map(Label, Name_Series, df.Name))
Traceback (most recent call last):

  File "<ipython-input-9-713d2d55d303>", line 1, in <module>
    df['Label'] = list(map(Label, Name_Series, df.Name))

  File "Test.py", line 60, in Label
    if Name_Series.str.contains(Name).sum()>0:

AttributeError: 'unicode' object has no attribute 'str'

So when I used the map funktion it took only one value out of the series instead of taking the whole series. Can I somehow tell the map function to take the series as a whole as argument instead of one value out of the series?

If someone comes up with another solution that leads to the same result I would appreciate it. My first try was to write a loop that goes throgh every row and returned 1 or 0 but that was extremely slow. The dataframes, where it will be applied have 200k+ rows and the series to search will include about 20k names.

Upvotes: 2

Views: 2573

Answers (2)

Alexander
Alexander

Reputation: 109636

You can simply use isin. Multiplying the boolean result by 1 converts it to zeros and ones: you could also use .astype(int)

df['Label'] = df.Name.isin(Name_Series) * 1

>>> df
        Name  Label
0  JP2015121      0
1    US14822      1
2    US14358      0
3  JP2015539      0
4  JP2015156      0

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210892

try this:

df['Prio'] = 0
df['Prio'] = df[df['Name'].isin(Name_Series)] = 1

Upvotes: 0

Related Questions