Reputation: 35
I asked this question for R a few months back and got a great answer that I used often. Now I'm trying to transition to Python but I was dreading attempting rewriting this code snippet. And now after trying I haven't been able to translate the answer I got (or find anything similar by searching).
The question is: I have a dataframe that I'd like to append new columns to where the calculation is dependent on values in another dataframe which holds the instructions.
I have created a reproducible example below (although in reality there are quite a few more columns and many rows so speed is important and I'd like to avoid a loop if possible):
input dataframes:
import pandas as pd;
data = {"A":["orange","apple","banana"],"B":[5,3,6],"C":[7,12,4],"D":[5,2,7],"E":[1,18,4]}
data_df = pd.DataFrame(data)
key = {"cols":["A","B","C","D","E"],"include":["no","no","yes","no","yes"],"subtract":["na","A","B","C","D"],"names":["na","G","H","I","J"]}
key_df = pd.DataFrame(key)
desired output (same as data but with 2 new columns):
output = {"A":["orange","apple","banana"],"B":[5,3,6],"C":[7,12,4],"D":[5,2,7],"E":[1,18,4],"H":[2,9,-2],"J":[-4,16,-3]}
output_df= pd.DataFrame(output)
So, the key dataframe has 1 row for each column in the base dataframe and it has an "include" column that has to be set to "yes" if any calculation is to be done. When it is set to "yes", then I want to add a new column with a defined name that subtracts a defined column (all lookups from the key dataframe).
For example, column "C" in the base dataframe is included so I want to create a new column called "H" which is the the value from column "C" minus the value from column "B".
p.s. here was the answer from R in case that triggers any thought processes for someone better skillled than me!
k <- subset(key, include == "yes")
output <- cbind(base,setNames(base[k[["cols"]]]-base[k[["subtract"]]],k$names))
Upvotes: 0
Views: 50
Reputation: 28644
Filter for the yes
values in include
:
yes = key_df.loc[key_df.include.eq("yes"), ["cols", "subtract", "names"]]
cols subtract names
2 C B H
4 E D J
Create a dictionary of the yes values and unpack it in the assign method::
yes_values = { name: data_df[col] - data_df[subtract]
for col, subtract, name
in yes.to_numpy()}
data_df.assign(**yes_values)
A B C D E H J
0 orange 5 7 5 1 2 -4
1 apple 3 12 2 18 9 16
2 banana 6 4 7 4 -2 -3
Upvotes: 2