Reputation: 35
I have two data frames with a similar shape to:
df1 = pd.DataFrame([[3.2,5.8,46],[3.5,4.4,50],[5.4,6.7,40]], index = ['sample1','sample2','sample3'], columns = ['L1','L2','L3'])
L1 L2 L3
sample1 3.2 5.8 46
sample2 3.5 4.4 50
sample3 5.4 6.7 40
df2 = pd.DataFrame([[0.02,0.03,0.04,0.05,0.06],[0.2, 0.3, 0.4, 0.5, 0.7],[2, 3, 4, 5, 7]])
0 1 2 3 4
0 0.02 0.03 0.04 0.05 0.06
1 0.20 0.30 0.40 0.50 0.70
2 2.00 3.00 4.00 5.00 7.00
I would like to multiply the first row in df2 by the L1 value for sample 1 (3.2) in df1, then multiply the second row in df2 by the L2 value for sample 1 (5.8)in df1 and then multiply the third row in df2 by the L3 value for sample 1 (46) in df1. I would then need to repeat this for sample 2 (e.g., row 1 by the L1 value for sample2, row 2 by the L2 value for sample2, and row3 by the L3 value for sample2.) And so on for each sample (with my actual dataset I have 100s of samples). With the creation of a new dataframe either for each sample or for all of the samples as the output. I'm not sure how to set the relevant code up?
Upvotes: 0
Views: 642
Reputation: 674
Something like this,
sample_lists = {}
for df1_index, df1_row in df1.iterrows():
sample = df1_index
print(f'\nPROCESSING SAMPLE {sample}')
df1_row = df1_row.tolist()
sample_list = []
for value in df1_row:
index_number = df1_row.index(value)
df2_row = df2.iloc[index_number, :].tolist()
print(f'Mulitplying {df2_row} with {value}')
int_list = [v*value for v in df2_row]
sample_list.append(int_list)
sample_lists[sample] = sample_list
print(f'\nFINAL OUTPUT: {sample_lists}')
Feel free to remove the print
statements. You can then use this dict
to create a dataframe
.
Explanation:
df1
and convert that to a list
value
in that list, get the index of the value
. This is done so that you can get the row that matches the index in df2
which will be our next step.df2
df1
(sample1, sample2, etc.)Pretty certain you can use lambda
and apply
to simplify the code above.
Upvotes: 0
Reputation: 467
Please check the following code
column_list = df1.columns
sample_list = df1.index
# Loop over samples and columns
new_df = pd.DataFrame()
for sample in sample_list:
for ind, column in enumerate(column_list):
multiply_by_sample = df2.iloc[ind] * df1.loc[sample][column]
new_df = new_df.append(multiply_by_sample, ignore_index=True)
Upvotes: 1