Reputation: 515
I have a detailed Transaction data
Dataframe as below
df_col = pd.DataFrame({'SQ':[1,1,2],
'City':['A','A','B'],
'Date':['7-1-2020','7-2-2020','7-1-2020'],
'Loc 1':[40,21,27],
'Loc 2':[37,40,14],
'Loc 3':[49,38,36],
'Loc 4':[20,14,18],
'Loc 5':[48,27,36]})
+----+------+----------+-------+-------+-------+-------+-------+
| SQ | City | Date | Loc 1 | Loc 2 | Loc 3 | Loc 4 | Loc 5 |
+----+------+----------+-------+-------+-------+-------+-------+
| 1 | A | 7-1-2020 | 40 | 37 | 49 | 20 | 48 |
+----+------+----------+-------+-------+-------+-------+-------+
| 1 | A | 7-2-2020 | 21 | 40 | 38 | 14 | 27 |
+----+------+----------+-------+-------+-------+-------+-------+
| 2 | B | 7-1-2020 | 27 | 14 | 36 | 18 | 36 |
+----+------+----------+-------+-------+-------+-------+-------+
Also I've a separate Location data
DataFrame as below
+------------+--------------+
| LocationNo | LocationType |
+------------+--------------+
| Loc 1 | Class A |
+------------+--------------+
| Loc 2 | Class A |
+------------+--------------+
| Loc 3 | Class B |
+------------+--------------+
| Loc 4 | Class C |
+------------+--------------+
| Loc 5 | Class C |
+------------+--------------+
df_row = pd.DataFrame({'LocationNo':['Loc 1','Loc 2','Loc 3','Loc 4','Loc 5'],
'LocationType':['Class A', 'Class A', 'Class B', 'Class C', 'Class C']
})
Now my task is to merge
columns of df_col
to rows in df_row
and sum the values.
merging Vertical to Horizontal
i.e Columns to Rows
.
my desired output is as below
+----+------+----------+---------+---------+---------+
| SQ | City | Date | Class A | Class B | Class C |
+----+------+----------+---------+---------+---------+
| 1 | A | 7-1-2020 | 77 | 49 | 68 |
+----+------+----------+---------+---------+---------+
| 1 | A | 7-2-2020 | 61 | 38 | 41 |
+----+------+----------+---------+---------+---------+
| 2 | B | 7-1-2020 | 41 | 36 | 54 |
+----+------+----------+---------+---------+---------+
I wrote the below code
# setting the index
df_col.set_index(['SQ','City','Date'], inplace=True)
df_row.set_index('LocationNo', inplace=True)
# I tried to merge vertically columns to columns. Hence, transpose the df_col.T
df_final = df_col.T.merge(df_row, left_index=True, right_index=True, how='left').groupby('LocationType').agg('sum').T
What result outputted from the above code is as below
+------------------+-----------+-----------+-----------+
| LocationType | Class A | Class B | Class C |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) | 77 | 49 | 68 |
+------------------+-----------+-----------+-----------+
| (1, A, 7-1-2020) | 61 | 38 | 41 |
+------------------+-----------+-----------+-----------+
| (2, B, 7-2-2020) | 41 | 36 | 54 |
+------------------+-----------+-----------+-----------+
The answer is correct. But, the First 3 columns are merged into one column. I need to separate this into individual columns similar to my desired output
mentioned above
how should I solve this issue and What is the Efficient way to do this?
Upvotes: 2
Views: 72
Reputation: 22493
One way using melt
, merge
and groupby
:
print (df_col.melt(id_vars=["SQ", "City", "Date"], var_name="LocationNo")
.merge(df_row, how="left", on="LocationNo")
.groupby(["SQ", "City","LocationType", "Date"])["value"].sum()
.unstack("LocationType"))
LocationType Class A Class B Class C
SQ City Date
1 A 7-1-2020 77 49 68
7-2-2020 61 38 41
2 B 7-2-2020 41 36 54
Upvotes: 3
Reputation: 71689
Let's filter
the loc
like columns from df_col
, then map these columns to LocationType
based on LocationNo
from df_row
, finally groupby
the dataframe d
on these mapped columns along axis=1
and agg
using sum
:
d = df_col.filter(like='Loc')
g = d.columns.map(df_row.set_index('LocationNo')['LocationType'])
out = df_col[['SQ','City','Date']].join(d.groupby(g, axis=1).sum())
SQ City Date Class A Class B Class C
0 1 A 7-1-2020 77 49 68
1 1 A 7-1-2020 61 38 41
2 2 B 7-2-2020 41 36 54
Upvotes: 4