Reputation: 51907
I have a dataset that includes three columns:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,2,3,3],
'B': [1.0, 2.0, 3.0, 2.0, 3.0, 3.0],
'C': [0.0, 3.5, 1.2, 2.1, 3.1, 0.0]})
Now, obviously I can use df['A'].value_counts()
to get me the counts of the values in column A
:
df['A'].value_counts()
3 3
2 2
1 1
Name: A, dtype: int64
However, what I need is to be able to change the value of the count based on the relationship between B
and C
.
For instance:
df['B'][0] - df['C'][0]
1.0
df['B'][1] - df['C'][1]
-1.5
Im my case, I would like sums > 0
to count as 1, sums < 0
to count as -1, and sums of 0
to count as, well, 0.
So for my purposes, having B
and C
turn into something like this:
df = pd.DataFrame({'A': [1, 2, 3, 2, 3, 3],
'counts': [1, -1, 1, -1, -1, 1]})
And then somehow be able to translate that into:
3 2
1 1
2 -2
Is what I'm after. How would I do this using pandas?
Upvotes: 1
Views: 72
Reputation: 215117
import pandas as pd
import numpy as np
df['counts'] = np.sign(df.B - df.C) # use the numpy.sign to create the count column
df.groupby('A')['counts'].sum() # group the counts by column A and sum the value
#A
#1 1.0
#2 -2.0
#3 1.0
#Name: counts, dtype: float64
Upvotes: 4
Reputation: 1329
df['counts'] = 0
df.loc[df['B'] - df['C'] > 0, 'counts'] = 1
df.loc[df['B'] - df['C'] < 0, 'counts'] = -1
Upvotes: 2