Test if value in a Pandas series is in an array

Question

This seems like the answer will be something obvious but...

I have a series like this:

Dataset['Variable'] = ['a','b','b','a','e','c','d']

and a list like this:

List_vals1 = ['a','b']
List_vals2 = ['e','c','d']

I want to create 2 new variables in the dataset to see if the value of Dataset['Variable'] is in each list.

Dataset['Var_for_List_vals1'] = ['1','1','1','1','0','0','0']
Dataset['Var_for_List_vals2'] = ['0','0','0','0','1','1','1']

I tried to do this:

Dataset['Var_for_List_vals1'] = (Dataset[Dataset['Variable' in List_vals1]])*1 (times 1 to convert to numeric)

and python did not like that solution.

This seems like an obvious one but nothing seems to work for me. Thanks in advance for the help!

CT Zhu · Accepted Answer

Note that using map will become much slower than the numpy.in1d method, when data dimension gets large:

In [1]:

import pandas as pd
import numpy as np
In [7]:

df = pd.DataFrame({'Variable': ['a','b','b','a','e','c','d']*100}) #700 lines of data
List_vals1 = ['a','b']
List_vals2 = ['e','c','d']
In [8]:

df['var_for_List_vals1'] = np.in1d(df.Variable, List_vals1)
#return Boolean values
In [9]:

%timeit np.in1d(df.Variable, List_vals1)
10000 loops, best of 3: 112 µs per loop
In [10]:

%timeit map(lambda x: 1 if x in List_vals1 else 0, df['Variable'])
1000 loops, best of 3: 287 µs per loop

See doc here

Test if value in a Pandas series is in an array

Answers (2)

Related Questions