Reputation: 33
So i got an Pandas DataFrame that looks like this:
import pandas as pd
df1 = pd.DataFrame([[5618, 5863, 8873, 7903, 9477, 7177, 7648, 9592],
[5698, 6009, 8242, 7356, 6191, 8817, 7340, 11781],
[5721, 6858, 8401, 6826, 6910, 6243, 6814, 9704]],
columns=["Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb"])
Output:
Jul Aug Sep Oct Nov Dec Jan Feb
0 5618 5863 8873 7903 9477 7177 7648 9592
1 5698 6009 8242 7356 6191 8817 7340 11781
2 5721 6858 8401 6826 6910 6243 6814 9704
At first i want to insert 2 new rows with index 3 and 4: In the first one i want to divide the values of row 1 by the values of row 0:
Jul Aug Sep Oct Nov Dec Jan Feb
3 101,42 102,49 92,88 93,07 65,32 122,8 95,97 122,82
In the second one i want to divide the values of row 1 by the values of row 2:
Jul Aug Sep Oct Nov Dec Jan Feb
4 99,59 87,62 98,10 107,76 89,59 141,23 107,71 121,40
In the next step i want to create a new column with the sum of the raw-values of the month and the averages of the new created rows.
df1["Sum_Avg"] = df1.sum(axis=1)
Output:
Jul Aug Sep Oct Nov Dec Jan Feb Sum_Avg
0 5618 5863 8873 7903 9477 7177 7648 9592 62151
1 5698 6009 8242 7356 6191 8817 7340 11781 61434
2 5721 6858 8401 6826 6910 6243 6814 9704 57477
I don't know how to create the rows with index 3 and 4, so i even don't know how to put the averages in the same row as the sums.
At the end the full table should look like this: Img
What i tried so far:
Making a new DataFrame with the Row 0:
df2 = pd.DataFrame(df1.iloc[[0]])
df2
Output:
Jul Aug Sep Oct Nov Dec Jan Feb
0 5618 5863 8873 7903 9477 7177 7648 9592
Making a new DataFrame with Row 1:
df3 = pd.DataFrame(df1.iloc[[1]])
df3
Output:
Jul Aug Sep Oct Nov Dec Jan Feb
1 5698 6009 8242 7356 6191 8817 7340 11781
Making a new DataFrame with the division of df2 and df3:
df4 = df3/df2
df4
Output:
Jul Aug Sep Oct Nov Dec Jan Feb
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
and here things messed up and this is the reason why i'm creating this post.
Upvotes: 3
Views: 140
Reputation: 862591
Use DataFrame.div
with shifted values by DataFrame.shift
, remove first only NaN
row by indexing and add to original by concat
:
df1["Sum_Avg"] = df1.sum(axis=1)
df = pd.concat([df1, df1.div(df1.shift()).iloc[1:]], ignore_index=True)
print (df)
Jul Aug Sep Oct Nov \
0 5618.000000 5863.000000 8873.000000 7903.000000 9477.000000
1 5698.000000 6009.000000 8242.000000 7356.000000 6191.000000
2 5721.000000 6858.000000 8401.000000 6826.000000 6910.000000
3 1.014240 1.024902 0.928885 0.930786 0.653266
4 1.004037 1.141288 1.019291 0.927950 1.116136
Dec Jan Feb Sum_Avg
0 7177.000000 7648.000000 9592.000000 62151.000000
1 8817.000000 7340.000000 11781.000000 61434.000000
2 6243.000000 6814.000000 9704.000000 57477.000000
3 1.228508 0.959728 1.228211 0.988464
4 0.708064 0.928338 0.823699 0.935589
Solution by ouput data:
df1["Sum_Avg"] = df1.sum(axis=1)
df = pd.concat([df1, df1.iloc[1].div(df1.iloc[[0,2]]) ], ignore_index=True)
print (df)
Jul Aug Sep Oct Nov \
0 5618.00000 5863.000000 8873.000000 7903.000000 9477.000000
1 5698.00000 6009.000000 8242.000000 7356.000000 6191.000000
2 5721.00000 6858.000000 8401.000000 6826.000000 6910.000000
3 1.01424 1.024902 0.928885 0.930786 0.653266
4 0.99598 0.876203 0.981074 1.077644 0.895948
Dec Jan Feb Sum_Avg
0 7177.000000 7648.000000 9592.000000 62151.000000
1 8817.000000 7340.000000 11781.000000 61434.000000
2 6243.000000 6814.000000 9704.000000 57477.000000
3 1.228508 0.959728 1.228211 0.988464
4 1.412302 1.077194 1.214035 1.068845
Upvotes: 3
Reputation: 9197
You could try this:
df = df1.T
df[3] = df[1] / df[0]
df[4] = df[1] / df[2]
df1 = df.T
df1["Sum_Avg"] = df1.sum(axis=1)
# Jul Aug ... Feb Sum_Avg
# 0 5618.00000 5863.000000 ... 9592.000000 62151.000000
# 1 5698.00000 6009.000000 ... 11781.000000 61434.000000
# 2 5721.00000 6858.000000 ... 9704.000000 57477.000000
# 3 1.01424 1.024902 ... 1.228211 7.968526
# 4 0.99598 0.876203 ... 1.214035 8.530380
# [5 rows x 9 columns]
Upvotes: 0