Reputation: 1325
I want to be able to test if some cells that are lists are equal to [0]
and Var1==4
, and set a new column to 1
if this happens. Input and expected output are below.
I had several tries but only managed with apply
and lambda
, and this does not scale well for 50k+ rows. Is there a faster way I'm missing?
Input:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Id': [1,2,3,4],
'Var1': [[0,1],[0],[6,7],[0]],
})
Id Var1
1 [0, 1]
2 [0]
3 [6, 7]
4 [0]
What I've tried:
df['ERR'] = 0
df.loc[(df['Id']==4) & (df['Var1']==[0]) , 'ERR'] = 1 # doesn't work
df.loc[(df['Id']==4) & (df['Var1'].isin([0])) , 'ERR'] = 1 # doesn't work
df['ERR'] = df.apply(lambda x: 1 if x['Id']==4 and x['Var1']==[0] else 0 , axis = 1)
Expected output:
Id Var1 ERR
1 [0, 1] 0
2 [0] 0
3 [6, 7] 0
4 [0] 1
Upvotes: 0
Views: 86
Reputation: 862851
You can compare by tuple
or set
:
df['ERR1'] = ((df['Id']==4) & (df['Var1'].apply(tuple)==(0, ))).astype(int)
df['ERR2'] = ((df['Id']==4) & ([tuple(x) ==(0, ) for x in df['Var1']])).astype(int)
df['ERR3'] = ((df['Id']==4) & (df['Var1'].apply(set)==set([0]))).astype(int)
df['ERR4'] = ((df['Id']==4) & ([set(x) == set([0]) for x in df['Var1']])).astype(int)
Performance (depends of input data):
df = pd.DataFrame({'Id': [1,2,3,4],
'Var1': [[0,1],[0],[6,7],[0]],
})
df = pd.concat([df] * 10000, ignore_index=True)
In [188]: %timeit df['ERR1'] = ((df['Id']==4) & (df['Var1'].apply(tuple)==(0, ))).astype(int)
13.1 ms ± 318 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [189]: %timeit df['ERR2'] = ((df['Id']==4) & ([tuple(x) ==(0, ) for x in df['Var1']])).astype(int)
8.98 ms ± 266 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [190]: %timeit df['ERR3'] = ((df['Id']==4) & (df['Var1'].apply(set)==set([0]))).astype(int)
17 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [191]: %timeit df['ERR4'] = ((df['Id']==4) & ([set(x) == set([0]) for x in df['Var1']])).astype(int)
19.4 ms ± 93.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 2