And_R
And_R

Reputation: 1667

Grouping and Transforming in pandas

My data have the following structure:

       Name     Value       id
0    Alegro  0.850122   alegro
1    Alegro  0.447362   alegro
2    AlEgro  0.711295   alegro
3    ALEGRO  0.123761   alegro
4    alegRo  0.273111   alegro
5    ALEGRO  0.564893   alegro
6    ALEGRO  0.276369   alegro
7    ALEGRO  0.526434   alegro
8    ALEGRO  0.924014   alegro
9    ALEGrO  0.629207   alegro
10  Belagio  0.834231  belagio
11  BElagio  0.788357  belagio
12  Belagio  0.092156  belagio
13  BeLaGio  0.810275  belagio

To replicate run the code below:

data = {'Name': ['Alegro', 'Alegro', 'AlEgro', 'ALEGRO', 'alegRo', 'ALEGRO','ALEGRO',
                 'ALEGRO','ALEGRO','ALEGrO', 'Belagio','BElagio', 'Belagio', 'BeLaGio'],
     'Value': np.random.random(14)}

df = pd.DataFrame(data)

df['id'] = df.Name.str.lower()

You can see that there are some typos im dataset.

df.groupby('id').Name.value_counts()

id       Name   
alegro   ALEGRO     5
         Alegro     2
         ALEGrO     1
         AlEgro     1
         alegRo     1
belagio  Belagio    2
         BElagio    1
         BeLaGio    1

So the aim is to take the most frequent value from each category and set it as New name. For the first group it would be ALEGRO and for second Belagio.

The desired data frame should be:

       Name     Value       id
0    ALEGRO  0.850122   alegro
1    ALEGRO  0.447362   alegro
2    ALEGRO  0.711295   alegro
3    ALEGRO  0.123761   alegro
4    ALEGRO  0.273111   alegro
5    ALEGRO  0.564893   alegro
6    ALEGRO  0.276369   alegro
7    ALEGRO  0.526434   alegro
8    ALEGRO  0.924014   alegro
9    ALEGRO  0.629207   alegro
10  Belagio  0.834231  belagio
11  Belagio  0.788357  belagio
12  Belagio  0.092156  belagio
13  Belagio  0.810275  belagio

Any idea would be highly appreciated!

Upvotes: 5

Views: 47

Answers (1)

jezrael
jezrael

Reputation: 862641

Use GroupBy.transform for return Series with same size like original DataFrame, so possible create new column.

df['New'] = df.groupby('id').Name.transform(lambda x: x.value_counts().index[0])

Another solution:

df['New'] = df.groupby('id').Name.transform(lambda x: x.mode().iat[0])

print (df)
       Name     Value       id      New
0    Alegro  0.850122   alegro   ALEGRO
1    Alegro  0.447362   alegro   ALEGRO
2    AlEgro  0.711295   alegro   ALEGRO
3    ALEGRO  0.123761   alegro   ALEGRO
4    alegRo  0.273111   alegro   ALEGRO
5    ALEGRO  0.564893   alegro   ALEGRO
6    ALEGRO  0.276369   alegro   ALEGRO
7    ALEGRO  0.526434   alegro   ALEGRO
8    ALEGRO  0.924014   alegro   ALEGRO
9    ALEGrO  0.629207   alegro   ALEGRO
10  Belagio  0.834231  belagio  Belagio
11  BElagio  0.788357  belagio  Belagio
12  Belagio  0.092156  belagio  Belagio
13  BeLaGio  0.810275  belagio  Belagio

Upvotes: 5

Related Questions