roberto swiss
roberto swiss

Reputation: 155

Adding row values to a dataframe based on matching column labels

I try to get my head around this problem. I have three dataframes and I would like to merge (concatenate?) two of these dataframes based on values inside a third one. Here are the dataframes:

df1:

index,fields,a1,a2,a3,a4,a5
2018-06-01,price,1.1,2.1,3.1,4.1,5.1
2018-06-01,amount,15,25,35,45,55
2018-06-02,price,1.2,2.2,3.2,4.2,5.2
2018-06-02,amount,16,26,36,46,56
2018-06-03,price,1.3,2.3,3.3,4.3,5.3
2018-06-03,amount,17,27,37,47,57

df2:

index,fields,b1,b2,b3
2018-06-01,clients,1,2,3
2018-06-02,clients,1,2,3
2018-06-03,clients,1,2,3

Columns in df1 and df2 are different but their relationship is in df3.

df3:

index,product1,product2
0,a1,b1
1,a2,b1
2,a3,b2
3,a4,b2
4,a5,b3

I would like to merge the data in df1 and df2 but keep the same columns as in d1 (as b1, b2, b3 are referenced with a1, a2, a3, a4 and a5). Here is df4, the desired dataframe I want.

df4:

 index,fields,a1,a2,a3,a4,a5
 2018-06-01,price,1.1,2.1,3.1,4.1,5.1
 2018-06-01,amount,15,25,35,45,55
 2018-06-01,clients,1,1,2,2,3
 2018-06-02,price,1.2,2.2,3.2,4.2,5.2
 2018-06-02,amount,16,26,36,46,56
 2018-06-02,clients,4,4,5,5,6
 2018-06-03,price,1.3,2.3,3.3,4.3,5.3
 2018-06-03,amount,17,27,37,47,57
 2018-06-03,clients,7,7,8,8,9

many thanks in advance,

Upvotes: 0

Views: 2577

Answers (2)

koPytok
koPytok

Reputation: 3713

Unpivot df2 using df.melt:

df2_melt = df2.melt(["index", "fields"], var_name="product2")

Drop redundant column index from reference table df3 and pd.merge it with melted df2:

merged = pd.merge(df2_melt, df3.drop("index", axis=1), on="product2")\
    .drop("product2", axis=1)

Do pd.pivot_table from merge result:

new_rows = pd.pivot_table(merged, index=["index", "fields"],
                          columns="product1", values="value")\
    .reset_index()

Add new rows to df1 with pd.concat, sort rows and reset index:

pd.concat([df1, new_rows]).sort_values("index").reset_index(drop=True)

Result:

product1    index       fields  a1      a2      a3      a4      a5
0           2018-06-01  price   1.1     2.1     3.1     4.1     5.1
1           2018-06-01  amount  15.0    25.0    35.0    45.0    55.0
2           2018-06-01  clients 1.0     1.0     2.0     2.0     3.0
3           2018-06-02  price   1.2     2.2     3.2     4.2     5.2
4           2018-06-02  amount  16.0    26.0    36.0    46.0    56.0
5           2018-06-02  clients 1.0     1.0     2.0     2.0     3.0
6           2018-06-03  price   1.3     2.3     3.3     4.3     5.3
7           2018-06-03  amount  17.0    27.0    37.0    47.0    57.0
8           2018-06-03  clients 1.0     1.0     2.0     2.0     3.0

Upvotes: 2

Chris
Chris

Reputation: 76

if you rename df2's columns:

df2 = df2.rename(colunmns={'b1':'a1', 'b2':'a2', 'b3':'a3'})

then you can do a simple concat:

fields = [df1, df2]
df4 = pd.concat(fields)

and you get the desired df4

However in df2, there are only a1-a3, and in df4 there are columns a1-a5, so the rows from df2 will have NaN for a4, a5 unless you create their columns somehow. You can do this by:

df2['a4'] = df2['a1']

... etc

Upvotes: 0

Related Questions