krat0r
krat0r

Reputation: 33

Pandas: Dataframe Calculation - New Rows with Division, New Columns with Sums and Averages

So i got an Pandas DataFrame that looks like this:

import pandas as pd

df1 = pd.DataFrame([[5618, 5863, 8873, 7903, 9477, 7177, 7648, 9592],
                    [5698, 6009, 8242, 7356, 6191, 8817, 7340, 11781],
                    [5721, 6858, 8401, 6826, 6910, 6243, 6814, 9704]], 
                    columns=["Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb"])

Output:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb
0   5618    5863    8873    7903    9477    7177    7648    9592
1   5698    6009    8242    7356    6191    8817    7340    11781
2   5721    6858    8401    6826    6910    6243    6814    9704

At first i want to insert 2 new rows with index 3 and 4: In the first one i want to divide the values of row 1 by the values of row 0:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb
3   101,42  102,49  92,88   93,07   65,32   122,8   95,97   122,82

In the second one i want to divide the values of row 1 by the values of row 2:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb
4   99,59   87,62   98,10   107,76  89,59   141,23  107,71  121,40

In the next step i want to create a new column with the sum of the raw-values of the month and the averages of the new created rows.

df1["Sum_Avg"] = df1.sum(axis=1)
    
Output:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb     Sum_Avg
0   5618    5863    8873    7903    9477    7177    7648    9592    62151
1   5698    6009    8242    7356    6191    8817    7340    11781   61434
2   5721    6858    8401    6826    6910    6243    6814    9704    57477

I don't know how to create the rows with index 3 and 4, so i even don't know how to put the averages in the same row as the sums.

At the end the full table should look like this: Img

What i tried so far:

Making a new DataFrame with the Row 0:

df2 = pd.DataFrame(df1.iloc[[0]])
df2

Output:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb
0   5618    5863    8873    7903    9477    7177    7648    9592

Making a new DataFrame with Row 1:

df3 = pd.DataFrame(df1.iloc[[1]])
df3

Output:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb
1   5698    6009    8242    7356    6191    8817    7340    11781

Making a new DataFrame with the division of df2 and df3:

df4 = df3/df2
df4

Output:

    Jul     Aug     Sep     Oct     Nov     Dec     Jan     Feb
0   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
1   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

and here things messed up and this is the reason why i'm creating this post.

Upvotes: 3

Views: 140

Answers (2)

jezrael
jezrael

Reputation: 862591

Use DataFrame.div with shifted values by DataFrame.shift, remove first only NaN row by indexing and add to original by concat:

df1["Sum_Avg"] = df1.sum(axis=1)
df = pd.concat([df1, df1.div(df1.shift()).iloc[1:]], ignore_index=True)
print (df)
           Jul          Aug          Sep          Oct          Nov  \
0  5618.000000  5863.000000  8873.000000  7903.000000  9477.000000   
1  5698.000000  6009.000000  8242.000000  7356.000000  6191.000000   
2  5721.000000  6858.000000  8401.000000  6826.000000  6910.000000   
3     1.014240     1.024902     0.928885     0.930786     0.653266   
4     1.004037     1.141288     1.019291     0.927950     1.116136   

           Dec          Jan           Feb       Sum_Avg  
0  7177.000000  7648.000000   9592.000000  62151.000000  
1  8817.000000  7340.000000  11781.000000  61434.000000  
2  6243.000000  6814.000000   9704.000000  57477.000000  
3     1.228508     0.959728      1.228211      0.988464  
4     0.708064     0.928338      0.823699      0.935589  

Solution by ouput data:

df1["Sum_Avg"] = df1.sum(axis=1)
df = pd.concat([df1, df1.iloc[1].div(df1.iloc[[0,2]]) ], ignore_index=True)


print (df)
          Jul          Aug          Sep          Oct          Nov  \
0  5618.00000  5863.000000  8873.000000  7903.000000  9477.000000   
1  5698.00000  6009.000000  8242.000000  7356.000000  6191.000000   
2  5721.00000  6858.000000  8401.000000  6826.000000  6910.000000   
3     1.01424     1.024902     0.928885     0.930786     0.653266   
4     0.99598     0.876203     0.981074     1.077644     0.895948   

           Dec          Jan           Feb       Sum_Avg  
0  7177.000000  7648.000000   9592.000000  62151.000000  
1  8817.000000  7340.000000  11781.000000  61434.000000  
2  6243.000000  6814.000000   9704.000000  57477.000000  
3     1.228508     0.959728      1.228211      0.988464  
4     1.412302     1.077194      1.214035      1.068845  

Upvotes: 3

Andreas
Andreas

Reputation: 9197

You could try this:

df = df1.T

df[3] = df[1] / df[0]
df[4] = df[1] / df[2]

df1 = df.T
df1["Sum_Avg"] = df1.sum(axis=1)

#           Jul          Aug  ...           Feb       Sum_Avg
# 0  5618.00000  5863.000000  ...   9592.000000  62151.000000
# 1  5698.00000  6009.000000  ...  11781.000000  61434.000000
# 2  5721.00000  6858.000000  ...   9704.000000  57477.000000
# 3     1.01424     1.024902  ...      1.228211      7.968526
# 4     0.99598     0.876203  ...      1.214035      8.530380
# [5 rows x 9 columns]

Upvotes: 0

Related Questions