Reputation: 1586
I have a data frame A
, and I would like to sum over the rows that their row index value has a number greater or equal 10.
If this is not possible, I can live with a code that sums over rows 2-3 too.
import pandas as pd
import numpy as np
A = """
Tier Oct Nov Dec
0 up to 2M 4 5 10
1 5M 3 2 7
2 10M 6 0 2
3 15M 1 3 5
"""
tenplus = pd.Series(A(axis=0),index=A.columns[1:])
But this sums over the whole table. One thing I could do is to build another data frame from rows 2-3 and sume over them, but I prefer to learn the best practice!
Thanks!
Upvotes: 3
Views: 17101
Reputation: 74182
You can use normal slice indexing to select the rows you want to sum over:
print(df)
# Tier Oct Nov Dec
# 0 up to 2M 4 5 10
# 1 5M 3 2 7
# 2 10M 6 0 2
# 3 15M 1 3 5
# select the last two rows
print(df[2:4])
# Tier Oct Nov Dec
# 2 10M 6 0 2
# 3 15M 1 3 5
# sum over them
print(df[2:4].sum())
# Tier 10M15M
# Oct 7
# Nov 3
# Dec 7
# dtype: object
As you can see, summing the Tier
column gives a meaningless result, since "summing" strings just concatenates them. It would make more sense to sum over only the last three columns:
# select the last two rows and the last 3 columns
print(df.loc[2:4, ['Oct', 'Nov', 'Dec']])
# Oct Nov Dec
# 2 6 0 2
# 3 1 3 5
# sum over them
print(df.loc[2:4, ['Oct', 'Nov', 'Dec']].sum())
# Oct 7
# Nov 3
# Dec 7
# dtype: int64
# alternatively, use df.iloc[2:4, 1:] to select by column index rather than name
You can read more about how indexing works in pandas in the documentation here.
Upvotes: 3
Reputation: 375535
sum has an axis argument, pass axis=1 to sum over rows:
In [11]: df
Out[11]:
Tier Oct Nov Dec
0 up to 2M 4 5 10
1 5M 3 2 7
2 10M 6 0 2
3 15M 1 3 5
In [12]: df.sum(axis=1)
Out[12]:
0 19
1 12
2 8
3 9
dtype: int64
Note: This is discarding the non-numeric columns, you can filter these out explicitly before summing:
In [13]: df[['Oct', 'Nov', 'Dec']].sum(axis=1)
Out[13]:
0 19
1 12
2 8
3 9
dtype: int64
Upvotes: 0