Wayne Werner
Wayne Werner

Reputation: 51907

How do I compute the counts of one column based on the values of two others in Pandas?

I have a dataset that includes three columns:

import pandas as pd

df = pd.DataFrame({'A': [1,2,3,2,3,3],
                   'B': [1.0, 2.0, 3.0, 2.0, 3.0, 3.0],                
                   'C': [0.0, 3.5, 1.2, 2.1, 3.1, 0.0]})

Now, obviously I can use df['A'].value_counts() to get me the counts of the values in column A:

df['A'].value_counts()
3    3
2    2
1    1
Name: A, dtype: int64

However, what I need is to be able to change the value of the count based on the relationship between B and C.

For instance:

df['B'][0] - df['C'][0]
1.0
df['B'][1] - df['C'][1]
-1.5

Im my case, I would like sums > 0 to count as 1, sums < 0 to count as -1, and sums of 0 to count as, well, 0.

So for my purposes, having B and C turn into something like this:

df = pd.DataFrame({'A':      [1,  2, 3,  2,  3, 3],
                   'counts': [1, -1, 1, -1, -1, 1]})

And then somehow be able to translate that into:

3  2
1  1
2 -2

Is what I'm after. How would I do this using pandas?

Upvotes: 1

Views: 72

Answers (2)

akuiper
akuiper

Reputation: 215117

import pandas as pd
import numpy as np

df['counts'] = np.sign(df.B - df.C)    # use the numpy.sign to create the count column
df.groupby('A')['counts'].sum()        # group the counts by column A and sum the value

#A
#1    1.0
#2   -2.0
#3    1.0
#Name: counts, dtype: float64

Upvotes: 4

SO44
SO44

Reputation: 1329

df['counts'] = 0
df.loc[df['B'] - df['C'] > 0, 'counts'] = 1
df.loc[df['B'] - df['C'] < 0, 'counts'] = -1

Upvotes: 2

Related Questions