agm
agm

Reputation: 327

How to sum rows in the same column than the category in pandas dataframe - python

I have been working on formatting a log file and finally I have arrived to the following dataframe sample, where the categories and numbers I want to add are in the same column:

df = pd.DataFrame(dict(a=['Cat. A',1,1,3,'Cat. A',2,2,'Cat. B',3,5,2,6,'Cat. B',1,'Cat. C',4]))
>>> a
0   Cat. A
1   1
2   1
3   3
4   Cat. A
5   2
6   2
7   Cat. B
8   3
9   5
10  2
11  6
12  Cat. B
13  1
14  Cat. C
15  4

If I sum all the numbers below each category I want to obtain:

Cat. A= 1+1+3+2+2 = 9
Cat. B= 3+5+2+6+1 = 17
Cat. C= 4

I know how to do with going through all the files in the classic way, but I would like to know how to do it in a most pythonic way, considering that the numbers of rows for each category can be variable, and that the numbers of times that the category appears in each dataframe can be different too.

Upvotes: 6

Views: 1719

Answers (4)

Nour-Allah Hussein
Nour-Allah Hussein

Reputation: 1459

In a very fiew lines, in a simple and direct way:

import pandas as pd
df = pd.DataFrame(dict(a=['Cat. A', 1, 1, 3, 'Cat. A', 2, 2, 'Cat. B', 3, 5, 2, 6, 'Cat. B', 1, 'Cat. C', 4]))

df['label'] = (df['a'].apply(lambda x: x if str(x)[0] == 'C' else None)).ffill()
df['value'] = df['a'].apply(lambda x: 0 if str(x)[0] == 'C' else int(x))
result = df.groupby('label').sum()

print(result)

output

        value
label         
Cat. A       9
Cat. B      17
Cat. C       4

Explanation

label and 'value' columns are created and store labels and values, respectively then use groupby with sum

Upvotes: 0

Ch3steR
Ch3steR

Reputation: 20669

We can use pd.to_numeric to mark non-numeric fields as nan using Series.mask and Series.notna then use for group. Then use GroupBy.sum

a = pd.to_numeric(df['a'], errors='coerce')
g = df['a'].mask(a.notna()).ffill()
a.groupby(g).sum()

Cat. A     9.0
Cat. B    17.0
Cat. C     4.0
Name: a, dtype: float64

Upvotes: 3

mccandar
mccandar

Reputation: 788

This is also another way

df = pd.DataFrame(dict(a=['Cat. A',1,1,3,'Cat. A',2,2,'Cat. B',3,5,2,6,'Cat. B',1,'Cat. C',4]))

def coerce(x):
    try:
        int(x)
        return np.nan
    except:
        return x

def safesum(x):
    return x[x!=x.iloc[0]].astype(int).sum()


df['b'] = df['a'].apply(coerce).ffill()
df.groupby('b').agg(safesum)

Produces

         a
b         
Cat. A   9
Cat. B  17
Cat. C   4

Upvotes: 6

IoaTzimas
IoaTzimas

Reputation: 10624

Here is one way to do it. not very pythonic, but it does the work:

l=list(df.a)

d={i:0 for i in l if str(i)[:3]=='Cat'}

for i in l:
    if str(i)[:3]=='Cat':
        key=i
    else:
        d[key]+=i

>>> print(d)

{'Cat. A': 9, 'Cat. B': 17, 'Cat. C': 4}

Upvotes: 0

Related Questions