Reputation: 679
Still trying to figure out how to perform operations with multiple DataFrames form pandas, in Python.
I have the following three dataframes (d1
, d2
, and d3
):
For every user in user_id
, I need to use the values in the columns df2
as index of 'weeks' in df3
, and multiply them to the respective values in df1
.
E.g.: user 163, column measurements
has value 0.0 (from df2
). The look-up in df3
at week 0.0 is 2. The final value to be computed for this user/column is 2(from df1
) times 2 = 4.
I need to estimate this for all users in user_id and all columns (activity, nutrition, etc.)
Any ideas?
I have been playing with .apply but I find it hard to structure the problem correctly.
Upvotes: 0
Views: 3736
Reputation: 2228
The key, I think, is to put all this data together. You can work with it separately by iterating and going back and forth, but much easier and robust to use Pandas merge
functionality, like this:
import pandas as pd
data1 = {'user_id':[163], 'measurements':[2.0]}
data2 = {'user_id':[163], 'measurements':[0.0]}
data3 = {'weeks':[0.0], 'measurements':[2.0]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df = df1.merge(df2, on='user_id', how='outer', suffixes=['_df1', '_df2'])
df = df.merge(df3, left_on='measurements_df2', right_on='weeks',
how='outer', suffixes=['', '_df3'])
df['new_val'] = df['measurements_df1'] * df['measurements']
In [13]: df
Out[13]:
measurements_df1 user_id measurements_df2 measurements weeks new_val
0 2.0 163 0.0 2.0 0.0 4.0
In the future it's much easier if you give us a reproducible example to work with, especially if you can include errors with what you tried, but in this case I know what you mean about it being hard to figure out how to structure the question properly. I strongly suggest the book from the creator of Pandas, Wes McKinney.
Upvotes: 2