Reputation: 71
I have a dataframe similar to this one:
col_a col_b
A 1
B 6
B 3
C 2
C 3
D 6
E 7
F 8
E 8
I want to create a column c and do a cumulative count of everything larger than 1, 2, 3, 4, 5, 6, 7, and 8.
for example, for row 1, the number is 2, so I count total number larger or equal to 2 in col b for row 2, the number is 6, so I count total number larger than or equal to 6 in col b
the returned col should be something like this
col_c
8 (total count of col_b value that is larger or equal to 1)
5 (total count of col_b value that is larger or equal to 6)
6 (total count of col_b value that is larger or equal to 3)
7 (total count of col_b value that is larger or equal to 2)
My code looks like this :
df.loc[df['col_b'] >= 1, 'group'] = df[df['col_b'] >=8].count()
df.loc[df['col_b'] >= 2, 'group'] = df[df['col_b'] >=8].count()
df.loc[df['col_b'] >= 3, 'group'] = df[df['col_b'] >=8].count()
df.loc[df['col_b'] >= 4, 'group'] = df[df['col_b'] >=8].count()
df.loc[df['col_b'] >= 5, 'group'] = df[df['col_b'] >=8].count()
Is there anyway to make this easier? Also, my return is NA, instead of an actual count?
Upvotes: 1
Views: 77
Reputation: 1093
You should use sum instead of count and lambda function:
df = pd.DataFrame(
{
'col_a': ['A','B','B','C','C','D','E','F','E'],
'col_b': [1,6,3,2,3,6,7,8,8]
})
df['group'] = df.apply(lambda row : (df['col_b'] >= row['col_b']).sum() ,axis = 1)
Result:
col_a col_b group
0 A 1 9
1 B 6 5
2 B 3 7
3 C 2 8
4 C 3 7
5 D 6 5
6 E 7 3
7 F 8 2
8 E 8 2
Upvotes: 0
Reputation: 9619
df['col_c'] = df['col_b'].apply(lambda x: sum(i >= x for i in df['col_b'].tolist()))
Output:
| | col_a | col_b | col_c |
|---:|:--------|--------:|--------:|
| 0 | A | 1 | 9 |
| 1 | B | 6 | 5 |
| 2 | B | 3 | 7 |
| 3 | C | 2 | 8 |
| 4 | C | 3 | 7 |
| 5 | D | 6 | 5 |
| 6 | E | 7 | 3 |
| 7 | F | 8 | 2 |
| 8 | E | 8 | 2 |
Upvotes: 1
Reputation: 1624
Looks like you what you want is essentially a rank:
df['col_c'] = df['col_b'].rank(ascending=False, method='max')
col_a col_b col_c
0 A 1 9.0
1 B 6 5.0
2 B 3 7.0
3 C 2 8.0
4 C 3 7.0
5 D 6 5.0
6 E 7 3.0
7 F 8 2.0
8 E 8 2.0
You can subtract 1 to exclude the row itself.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html
Upvotes: 0