Reputation: 1741
How do i assign columns in my dataframe to be equal to another column if/where condition is met?
Update
The problem
I need to assign many columns values (and sometimes a value from another column in that row) when the condition is met.
The condition is not the problem.
I need an efficient way to do this:
df.loc[some condition it doesn't matter,
['a','b','c','d','e','f','g','x','y']]=df['z'],1,3,4,5,6,7,8,df['p']
Simplified example data
d = {'var' : pd.Series([10,61]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df=pd.DataFrame(d)
Condition if var is not missing and first digit is less than 5
Result make df.x=df.z & df.y=1
Here is psuedo code that doesn't work, but it is what I would want.
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x','y']]=df['z'],1
but i get
ValueError: cannot set using a list-like indexer with a different length than the value
ideal output
c var x z y
0 100 10 x x 1
1 0 61 None x None
The code below works, but is too inefficient because i need to assign values to multiple columns.
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x']]=df['z']
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['y']]=1
Upvotes: 2
Views: 8827
Reputation: 41013
You can work row wise:
def f(row):
if row['var'] is not None and int(str(row['var'])[0]) < 5:
row[['x', 'y']] = row['z'], 1
return row
>>> df.apply(f, axis=1)
c var x y z
0 100 10 x 1 x
1 0 61 None NaN x
To overwrite the original df:
df = df.apply(f, axis=1)
Upvotes: 2
Reputation: 14748
This is one way of doing it:
import pandas as pd
import numpy as np
d = {'var' : pd.Series([1,6]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df = pd.DataFrame(d)
# Condition 1: if var is not missing
cond1 = ~df['var'].apply(np.isnan)
# Condition 2: first number is less than 5
cond2 = df['var'].apply(lambda x: int(str(x)[0])) < 5
mask = cond1 & cond2
df.ix[mask, 'x'] = df.ix[mask, 'z']
df.ix[mask, 'y'] = 1
print df
Output:
c var x y z
0 100 1 x 1 x
1 0 6 None None x
As you can see, the Boolean mask has to be applied on both side of the assignment, and you need to broadcast the value 1
on the y
column. It is probably cleaner to split the steps into multiple lines.
Question updated, edit: More generally, since some assignments depend on the other columns, and some assignments are just broadcasting along the column, you can do it in two steps:
df.loc[conds, ['a','y']] = df.loc[conds, ['z','p']]
df.loc[conds, ['b','c','d','e','f','g','x']] = [1,3,4,5,6,7,8]
You may profile and see if this is efficient enough for your use case.
Upvotes: 2