Ana
Ana

Reputation: 1586

how to sum over certain row of a data frame in Python

I have a data frame A, and I would like to sum over the rows that their row index value has a number greater or equal 10. If this is not possible, I can live with a code that sums over rows 2-3 too.

import pandas as pd
import numpy as np
A = """
        Tier         Oct   Nov   Dec
    0   up to 2M     4     5     10
    1   5M           3     2     7
    2   10M          6     0     2
    3   15M          1     3     5
   """
tenplus = pd.Series(A(axis=0),index=A.columns[1:])

But this sums over the whole table. One thing I could do is to build another data frame from rows 2-3 and sume over them, but I prefer to learn the best practice!

Thanks!

Upvotes: 3

Views: 17101

Answers (2)

ali_m
ali_m

Reputation: 74182

You can use normal slice indexing to select the rows you want to sum over:

print(df)
#        Tier  Oct  Nov  Dec
# 0  up to 2M    4    5   10
# 1        5M    3    2    7
# 2       10M    6    0    2
# 3       15M    1    3    5

# select the last two rows
print(df[2:4])
#   Tier  Oct  Nov  Dec
# 2  10M    6    0    2
# 3  15M    1    3    5

# sum over them
print(df[2:4].sum())
# Tier    10M15M
# Oct          7
# Nov          3
# Dec          7
# dtype: object

As you can see, summing the Tier column gives a meaningless result, since "summing" strings just concatenates them. It would make more sense to sum over only the last three columns:

# select the last two rows and the last 3 columns
print(df.loc[2:4, ['Oct', 'Nov', 'Dec']])
#    Oct  Nov  Dec
# 2    6    0    2
# 3    1    3    5

# sum over them
print(df.loc[2:4, ['Oct', 'Nov', 'Dec']].sum())
# Oct    7
# Nov    3
# Dec    7
# dtype: int64

# alternatively, use df.iloc[2:4, 1:] to select by column index rather than name

You can read more about how indexing works in pandas in the documentation here.

Upvotes: 3

Andy Hayden
Andy Hayden

Reputation: 375535

sum has an axis argument, pass axis=1 to sum over rows:

In [11]: df
Out[11]:
       Tier  Oct  Nov  Dec
0  up to 2M    4    5   10
1        5M    3    2    7
2       10M    6    0    2
3       15M    1    3    5

In [12]: df.sum(axis=1)
Out[12]:
0    19
1    12
2     8
3     9
dtype: int64

Note: This is discarding the non-numeric columns, you can filter these out explicitly before summing:

In [13]: df[['Oct', 'Nov', 'Dec']].sum(axis=1)
Out[13]:
0    19
1    12
2     8
3     9
dtype: int64

Upvotes: 0

Related Questions