Reputation: 53
I have two dataframes that I would like to turn into one dataframe.
dataframe 1:
level_1 level_2 ... 2021-01-30 00:00:00 2021-01-31 00:00:00
0 A X ... 85.191861 (*) 61.387459
1 A Y ... 40.931960 76.803598
2 A Z ... 36.811113 44.834602
3 B X ... 60.707172 2.457655
4 B Y ... 53.990172 50.636714
5 B Z ... 52.347966 27.938671
6 C X ... 17.619055 89.028475
7 C Y ... 91.442144 56.108239
8 C Z ... 8.633474 14.716926
9 D X ... 41.690869 68.389570
10 D Y ... 37.896461 11.763265
11 D Z ... 12.492016 70.571018
dataframe 2:
level_2 level_3 2020-02-01 00:00:00 2020-02-02 00:00:00
0 X 1060 0.10 0.0
1 X 1064 0.20 1.0
2 X 1065 0.50 0.0
3 X 1067 0.00 0.0
4 X 1068 0.00 0.0
5 X 1264 0.20 0.0
6 X 1061 0.00 0.0
7 X 1000 0.00 0.0
8 Y 1060 0.05 0.1
9 Y 1064 0.05 0.2
10 Y 1065 0.10 0.5
11 Y 1067 0.20 0.0
12 Y 1068 0.20 0.0
13 Y 1264 0.10 0.2
14 Y 1061 0.15 0.0
15 Y 1000 0.15 0.0
16 Z 1060 0.00 0.0
17 Z 1064 0.00 0.0
18 Z 1065 0.00 0.0
19 Z 1067 0.90 0.9
20 Z 1068 0.10 0.1
21 Z 1264 0.00 0.0
22 Z 1061 0.00 0.0
23 Z 1000 0.00 0.0
I would like to add a third column to the first dataframe, level 3, and split the original row across the percentage given in dataframe 2. The numbers aren't correct, but the structure would look something like this:
level_1 level_2 level_3 2020-02-01 00:00:00
0 A X 1060 2.374184
1 A X 1064 4.748367
2 A X 1065 11.870918
3 A X 1067 0.000000
4 A X 1068 0.000000
5 A X 1264 4.748367
6 A X 1061 0.000000
7 A X 1000 0.000000
8 A Y 1060 2.813819
9 A Y 1064 2.813819
10 A Y 1065 5.627637
11 A Y 1067 11.255275
12 A Y 1068 11.255275
13 A Y 1264 5.627637
14 A Y 1061 8.441456
15 A Y 1000 8.441456
16 A Z 1060 0.000000
17 A Z 1064 0.000000
18 A Z 1065 0.000000
19 A Z 1067 24.890250
20 A Z 1068 2.765583
21 A Z 1264 0.000000
22 A Z 1061 0.000000
23 A Z 1000 0.000000
So every line dataframe would be split out across 8 factors (1060, 1064, 1065, 1067, 1068, 1264, 1061, 1000) So the original amount would 85.19 (*), would be split into 8 parts. The split is done according to the percentage in dataframe 2. I was thinking of stacking the dataframes and doing a merge, but currently haven't managed to make it work.
Thanks!
Upvotes: 0
Views: 96
Reputation: 31166
df1 = pd.read_csv(io.StringIO(""" level_1 level_2 ... 2021-01-30 00:00:00 2021-01-31 00:00:00
0 A X ... 85.191861 61.387459
1 A Y ... 40.931960 76.803598
2 A Z ... 36.811113 44.834602
3 B X ... 60.707172 2.457655
4 B Y ... 53.990172 50.636714
5 B Z ... 52.347966 27.938671
6 C X ... 17.619055 89.028475
7 C Y ... 91.442144 56.108239
8 C Z ... 8.633474 14.716926
9 D X ... 41.690869 68.389570
10 D Y ... 37.896461 11.763265
11 D Z ... 12.492016 70.571018
"""), sep="\s\s+", engine="python").drop(columns="...")
# cleanup DF1, make date columns rows
df1 = df1.set_index(["level_1","level_2"]).rename_axis(columns="Date").stack().reset_index()
df1.Date = pd.to_datetime(df1.Date)
df2 = pd.read_csv(io.StringIO(""" level_2 level_3 2021-01-30 00:00:00 2021-01-31 00:00:00
0 X 1060 0.10 0.0
1 X 1064 0.20 1.0
2 X 1065 0.50 0.0
3 X 1067 0.00 0.0
4 X 1068 0.00 0.0
5 X 1264 0.20 0.0
6 X 1061 0.00 0.0
7 X 1000 0.00 0.0
8 Y 1060 0.05 0.1
9 Y 1064 0.05 0.2
10 Y 1065 0.10 0.5
11 Y 1067 0.20 0.0
12 Y 1068 0.20 0.0
13 Y 1264 0.10 0.2
14 Y 1061 0.15 0.0
15 Y 1000 0.15 0.0
16 Z 1060 0.00 0.0
17 Z 1064 0.00 0.0
18 Z 1065 0.00 0.0
19 Z 1067 0.90 0.9
20 Z 1068 0.10 0.1
21 Z 1264 0.00 0.0
22 Z 1061 0.00 0.0
23 Z 1000 0.00 0.0"""), sep="\s\s+", engine="python")
df2 = df2.set_index(["level_2","level_3"]).rename_axis(columns="Date").stack().reset_index()
df2.Date = pd.to_datetime(df2.Date)
dfm = (df1.merge(df2, on=["level_2","Date"])
.assign(val=lambda dfa: dfa["0_x"]*dfa["0_y"])
.drop(columns=["0_x","0_y"])
.pivot(index=["level_1","level_2","level_3"], columns="Date", values="val")
.reset_index()
)
print(dfm.head(10).to_markdown())
level_1 | level_2 | level_3 | 2021-01-30 00:00:00 | 2021-01-31 00:00:00 | |
---|---|---|---|---|---|
0 | A | X | 1000 | 0 | 0 |
1 | A | X | 1060 | 8.51919 | 0 |
2 | A | X | 1061 | 0 | 0 |
3 | A | X | 1064 | 17.0384 | 61.3875 |
4 | A | X | 1065 | 42.5959 | 0 |
5 | A | X | 1067 | 0 | 0 |
6 | A | X | 1068 | 0 | 0 |
7 | A | X | 1264 | 17.0384 | 0 |
8 | A | Y | 1000 | 6.13979 | 0 |
9 | A | Y | 1060 | 2.0466 | 7.68036 |
Upvotes: 1