Reputation: 155
I try to get my head around this problem. I have three dataframes and I would like to merge (concatenate?) two of these dataframes based on values inside a third one. Here are the dataframes:
df1:
index,fields,a1,a2,a3,a4,a5
2018-06-01,price,1.1,2.1,3.1,4.1,5.1
2018-06-01,amount,15,25,35,45,55
2018-06-02,price,1.2,2.2,3.2,4.2,5.2
2018-06-02,amount,16,26,36,46,56
2018-06-03,price,1.3,2.3,3.3,4.3,5.3
2018-06-03,amount,17,27,37,47,57
df2:
index,fields,b1,b2,b3
2018-06-01,clients,1,2,3
2018-06-02,clients,1,2,3
2018-06-03,clients,1,2,3
Columns in df1 and df2 are different but their relationship is in df3.
df3:
index,product1,product2
0,a1,b1
1,a2,b1
2,a3,b2
3,a4,b2
4,a5,b3
I would like to merge the data in df1 and df2 but keep the same columns as in d1 (as b1, b2, b3 are referenced with a1, a2, a3, a4 and a5). Here is df4, the desired dataframe I want.
df4:
index,fields,a1,a2,a3,a4,a5
2018-06-01,price,1.1,2.1,3.1,4.1,5.1
2018-06-01,amount,15,25,35,45,55
2018-06-01,clients,1,1,2,2,3
2018-06-02,price,1.2,2.2,3.2,4.2,5.2
2018-06-02,amount,16,26,36,46,56
2018-06-02,clients,4,4,5,5,6
2018-06-03,price,1.3,2.3,3.3,4.3,5.3
2018-06-03,amount,17,27,37,47,57
2018-06-03,clients,7,7,8,8,9
many thanks in advance,
Upvotes: 0
Views: 2577
Reputation: 3713
Unpivot df2
using df.melt
:
df2_melt = df2.melt(["index", "fields"], var_name="product2")
Drop redundant column index
from reference table df3
and pd.merge
it with melted df2
:
merged = pd.merge(df2_melt, df3.drop("index", axis=1), on="product2")\
.drop("product2", axis=1)
Do pd.pivot_table
from merge result:
new_rows = pd.pivot_table(merged, index=["index", "fields"],
columns="product1", values="value")\
.reset_index()
Add new rows to df1
with pd.concat
, sort rows and reset index:
pd.concat([df1, new_rows]).sort_values("index").reset_index(drop=True)
Result:
product1 index fields a1 a2 a3 a4 a5
0 2018-06-01 price 1.1 2.1 3.1 4.1 5.1
1 2018-06-01 amount 15.0 25.0 35.0 45.0 55.0
2 2018-06-01 clients 1.0 1.0 2.0 2.0 3.0
3 2018-06-02 price 1.2 2.2 3.2 4.2 5.2
4 2018-06-02 amount 16.0 26.0 36.0 46.0 56.0
5 2018-06-02 clients 1.0 1.0 2.0 2.0 3.0
6 2018-06-03 price 1.3 2.3 3.3 4.3 5.3
7 2018-06-03 amount 17.0 27.0 37.0 47.0 57.0
8 2018-06-03 clients 1.0 1.0 2.0 2.0 3.0
Upvotes: 2
Reputation: 76
if you rename df2's columns:
df2 = df2.rename(colunmns={'b1':'a1', 'b2':'a2', 'b3':'a3'})
then you can do a simple concat:
fields = [df1, df2]
df4 = pd.concat(fields)
and you get the desired df4
However in df2, there are only a1-a3, and in df4 there are columns a1-a5, so the rows from df2 will have NaN for a4, a5 unless you create their columns somehow. You can do this by:
df2['a4'] = df2['a1']
... etc
Upvotes: 0