Reputation: 747
I'm converting code from R to Python and am looking for some help with mutating a new column based on other columns, using dfply
syntax/piping
In this example, I want to subtract 2 from col1
if col2
is 'c', otherwise add 4
import pandas as pd
import numpy as np
from dfply import *
col1 = [1,2,3,4,5]
col2 = ['a', 'b', 'c', 'd', 'e']
df = pd.DataFrame(data = {'col1': col1, 'col2': col2})
in R I would do:
df_new <- df %>%
mutate(newCol = ifelse(col2 == 'c', col1 - 2, col1 + 4))
but Python doesn't seem to like this:
new_df = (df >>
mutate(newCol = np.where(X.col2 == 'c', X.col1 - 2, X.col1 + 4)))
I get an error of "invalid __array_struct__"
Note that this works fine:
new_df = (df >>
mutate(newCol = X.col1 - 2))
Upvotes: 1
Views: 1898
Reputation: 6663
The python equivalent here would be a inline if else
expression (or ternary operator):
ifelse(col2 == 'c', col1 - 2, col1 + 4)
Would then become
col1 - 2 if col2 == 'c' else col1 + 4
Upvotes: 0
Reputation: 12704
I will use apply/lambda function. X is the dataframe row and axis=1 means apply the lambda function per column.
df['newCol'] = df.apply(lambda X: X.col1 - 2 if X.col2 == 'c' else X.col1 + 4, axis=1)
df
col1 col2 newCol
0 1 a 5
1 2 b 6
2 3 c 1
3 4 d 8
4 5 e 9
Upvotes: 4