Reputation: 1013
I have a DataFrame with a column name
that includes string data-type. I want to check if entries of this column exist in a Reference list. I tried pandas.apply
, but it doesn't work.
Sample data:
import pandas as pd
data = [('A', '10'),
('B', '10'),
('C', '10'),
('D', '10'),
('E', '20'),
('F', '20'),
('G', '25') ]
data_df = pd.DataFrame(data, columns = ['name', 'value'])
Sample code:
reference = ['A', 'B', 'Z']
def is_in_reference(x, reference):
if x in reference:
return 'Yes'
else:
return 'No'
data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference))
But, I get the error:
TypeError: is_in_reference() takes 2 positional arguments but 4 were given
I appreciate it if you could help me on this.
Upvotes: 0
Views: 249
Reputation: 1001
You can actually use the built-in Series.isin
function as in
data_df['is_in_reference'] = data_df['name'].isin(reference)
But since you asked about apply
, the fix is actually a small yet nefarious Python syntax issue, you MUST add a trailing comma in the args tuple:
data_df['is_in_reference'] = data_df['name'].apply(is_in_reference, args=(reference,))
NOTE the ,
in (reference,)
, otherwise Python does not turn this into a tuple.
Upvotes: 2