Reputation: 31
I'm pretty new to python and pandas and I have a small problem. I tried to find a solution on the forums and google but couldn't find one. So here we go:
I have a series that contains unique names:
in [8]: Name_Series.head()
Out[8]:
0 US2005642
1 US2007961
2 US13721
3 US2013770
4 US14822
dtype: object
In my dataframe there is a column that contains one name per row.
In [5]: df.Name.head()
Out[5]:
0 JP2015121
1 US14822
2 US14358
3 JP2015539
4 JP2015156
Name: AppNo, dtype: object
what I need is a new column 'Label' which contains a 1 if Name is contained in Name_Series and a 0 if its not contained.
My idea was to write a function that returns 1 or 0 and apply it to the dataframe:
def Label(Name_Series, Name):
if Name_Series.str.contains(Name).sum()>0:
return 1
else:
return 0
df['Prio'] = list(map(Label_Prio, PrioList, df.AppNo))
Unfortunately this leads to the following error:
IN [9]: df['Label'] = list(map(Label, Name_Series, df.Name))
Traceback (most recent call last):
File "<ipython-input-9-713d2d55d303>", line 1, in <module>
df['Label'] = list(map(Label, Name_Series, df.Name))
File "Test.py", line 60, in Label
if Name_Series.str.contains(Name).sum()>0:
AttributeError: 'unicode' object has no attribute 'str'
So when I used the map funktion it took only one value out of the series instead of taking the whole series. Can I somehow tell the map function to take the series as a whole as argument instead of one value out of the series?
If someone comes up with another solution that leads to the same result I would appreciate it. My first try was to write a loop that goes throgh every row and returned 1 or 0 but that was extremely slow. The dataframes, where it will be applied have 200k+ rows and the series to search will include about 20k names.
Upvotes: 2
Views: 2573
Reputation: 109636
You can simply use isin
. Multiplying the boolean result by 1 converts it to zeros and ones: you could also use .astype(int)
df['Label'] = df.Name.isin(Name_Series) * 1
>>> df
Name Label
0 JP2015121 0
1 US14822 1
2 US14358 0
3 JP2015539 0
4 JP2015156 0
Upvotes: 2
Reputation: 210892
try this:
df['Prio'] = 0
df['Prio'] = df[df['Name'].isin(Name_Series)] = 1
Upvotes: 0