Bryce Ramgovind
Bryce Ramgovind

Reputation: 3257

Pandas - Change Value of Group

I need to change the value of a group label of rows if they do not have enough points. For example,

+-----+
|c1|c2|
+-----+
|A |1 |
|A |2 |
|B |1 |
|A |2 |
|E |5 |
|E |6 |
|W |1 |
+-----+

If I were to group on the value within c2 and the minimum number of points within each group has to be greater than or equal to 2.

c2:
1 : count(c1) = 3
2 : count(c1) = 2
5 : count(c1) = 1
6 : count(c1) = 1

Clearly, groups 5 and 6 have only 1 element in each so then I would like to relabel those row's c2 values to -1.

This can be seen below.

+-----+
|c1|c2|
+-----+
|A |1 |
|A |2 |
|B |1 |
|A |2 |
|E |-1|
|E |-1|
|W |1 |
+-----+

This is the code I have written, however it is not updating the dataframe.

labels = df["c2"].unique()
for l in labels:
    group_size = df[DB["c2"]==l].shape[0]
    if group_size<=minPts:
        df[df["c2"]==l]["c2"] = -1

Upvotes: 1

Views: 1422

Answers (1)

jezrael
jezrael

Reputation: 862431

You can use value_counts, then filter and last set values by mask with isin:

s = df['c2'].value_counts()
s = s.index[s < 2]
print (s)
Int64Index([6, 5], dtype='int64')

df.loc[df['c2'].isin(s), 'c2'] = -1
print (df)
  c1  c2
0  A   1
1  A   2
2  B   1
3  A   2
4  E  -1
5  E  -1
6  W   1

Detail:

print (df['c2'].value_counts())
1    3
2    2
6    1
5    1
Name: c2, dtype: int64

print (df['c2'].isin(s))
0    False
1    False
2    False
3    False
4     True
5     True
6    False
Name: c2, dtype: bool

Upvotes: 1

Related Questions