Andrew
Andrew

Reputation: 714

How to use pandas to rename rows when they are the same in a column?

According to:

How to use pandas to rename rows when they are the same in column A?

My dataframe is : enter image description here

I want to use pandas to rename Hospital when a row with the same value in the Hospital column has a different value in the GeneralRepresentation column. And when a row with the same value in the Hospital column has the same value in the GeneralRepresentation column, no renaming is done for Hospital. And for hospitals without GeneralRepresentation, keep the name of the hospital the same.

The effect I want is shown below:

enter image description here

When I use Beny's code in How to use pandas to rename rows when they are the same in column A?:

g = df.groupby('Hospital')['GeneralRepresentation']
s1 = g.transform(lambda x :x.factorize()[0]+1).astype(str)
s2 = g.transform('nunique')
df['Hospital'] = np.where(s2==1, df['Hospital'], df['Hospital'] + '_' + s1,)

The effect is shown below: enter image description here

But what I want is for the name of the hospital to remain the same when a hospital does not have a GeneralRepresentation, the effect is like the second picture, how do I modify this code to fulfil my requirement?

Upvotes: 0

Views: 474

Answers (2)

wwnde
wwnde

Reputation: 26676

Use np.select(listof conditions, list of choices, alternative)

a=~(df['GeneralRepresentation'].str.contains('\w'))
b= ((df['GeneralRepresentation'].str.contains('\w'))&(df['Hospital'].duplicated(keep=False))&(df['GeneralRepresentation'].duplicated(keep=False)))

df['Hospital'] np.select([a,b],[df['Hospital']+'_'+(df.groupby('Hospital').cumcount()+1).astype(str),''],df['Hospital'])

Upvotes: 1

jezrael
jezrael

Reputation: 862681

Problem is with missing values, for misisng values is factorize set to -1, so if add 1 get 0 for last 2 rows, in my solution is replaced NaN to empty strings before groupby for prevent it:

g = df.fillna({'GeneralRepresentation':''}).groupby('Hospital')['GeneralRepresentation']
s1 = g.transform(lambda x :x.factorize()[0]+1).astype(str)
s2 = g.transform('nunique')
df['Hospital'] = np.where(s2==1, df['Hospital'], df['Hospital'] + '_' + s1)
print (df)
  Hospital GeneralRepresentation
0        a                     a
1      b_1                     b
2      b_2                     c
3      c_1                     d
4      c_2                     e
5        d                   NaN
6        t                   NaN

Upvotes: 1

Related Questions