Reputation: 1587
I have a DataFrame like this:
df = pd.DataFrame({'var1':['a','b','c'],
'var2':[[],[1,2,3],[2,3,4]]})
I would like to create a third column which gives the value in var1 if the corresponding list in var2 is empty, and the first element of the list in var2 otherwise. So my intended result is:
target = pd.DataFrame({'var1':['a','b','c'],
'var2':[[],[1,2,3],[2,3,4]],
'var3':['a',1,2]})
I've tried using np.where like this:
df['var3'] = np.where(len(df['var2'])>0 , df['var2'][0], df['var1'])
But it seems to be checking the length of the whole column rather than the length of the list within each row of the column. How can I get it to apply the condition to each row?
I have the same problem when I use bool(df['var2']) as my condition.
Upvotes: 1
Views: 2739
Reputation: 67
It sounds like a post digging, but i would prefer use np.where because of vectorization than list comprehension (too time costy) or apply. A lot of online tutorial deeply explain the mechanism like here.
Upvotes: 0
Reputation: 429
You could use a list comprehension:
v3 = [row['var1'] if len(row['var2'])==0 else row['var2'][0]
for i, row in df.iterrows()]
df['var3']=v3
Alternatively, you could use apply instead of where, to apply it to the whole dataframe:
First you need a function to use in apply
def f(row):
if len(row['var2'])==0:
return row['var1']
else:
return row['var2'][0]
Then apply it:
df['var3']= df.apply(f,axis=1)
Upvotes: 1
Reputation: 153500
Let's use .str accessors and len
:
df['var'] = np.where(df.var2.str.len() > 0, df.var2.str[0], df.var1)
Output:
var1 var2 var
0 a [] a
1 b [1, 2, 3] 1
2 c [2, 3, 4] 2
Upvotes: 4