Reputation: 405
I need to use a function to calculate a new column for a table using existing data from its 4 columns.
Suppose I have a function that calculates orders, impressions, or clicks - anytging from different sources. Something like this:
def claculate_new_columns(complete_orders_a, total_a, completed_orders_b, total_b):
total = 0.0
#just some random calculations bellow - not important
source_a = complete_dorders_a + 1
test_a = total_a + 1
source_b = completed_orders_b + 1
test_b = total_b + 1
for i in something(smth):
total += source_a*test_a*source_b*test_b
return total
How do I use it with data from DataFrame columns?
I want to run over rows in columns and insert the results in a new column. Something like this (it doesn't work, obviously):
old_df['new_column'] = old_df.apply(claculate_new_columns(column1,column2,column3,column4))
Would be glad for a correct way to apply such functions to a DataFrame and use these DataFrame columns as function's arguments. What is the correct syntax?
Solutions from StackOverflow don't work for me probably because I searched for wrong answers.
Upvotes: 1
Views: 52
Reputation: 103
To do calculations between columns and create a new column inside a function use apply
with axis = 1
For example:
df = pd.DataFrame({'column_1':[1,2,3,4,5],
'column_2':[10,20,30,40,50]})
def func(df):
# All Calculations here
df['new_column'] = df['column_1'] + df['column_2']
return df
df.apply(func, axis=1)
column_1 column_2 new_column
0 1 10 11
1 2 20 22
2 3 30 33
3 4 40 44
4 5 50 55
Upvotes: 1
Reputation: 9619
Use a lambda function:
old_df['new_column'] = old_df.apply(lambda row: claculate_new_columns(row['column1'], row['column2'], row['column3'], row['column4']), axis=1)
Upvotes: 2