Osama Billah
Osama Billah

Reputation: 101

How to remove duplicate from rows and convert its value to column in pandas

I have lot of data in the following format.

Person_ID           Person_value
1                    usr:value1
1                    val:value2
2                    usr:value1
2                    val:value2
3                    usr:value1
3                    val:value2
4                    usr:value1
4                    val:value2

But I want result like this:

Person_ID           Person_value            Person_value2
1                    Usr:value1              val:value2
2                    Usr:value1              val:value2
3                    Usr:value1              val:value2
4                    Usr:value1              val:value2

OR Like this::

Person_ID                  Person_value
1                    Usr:value1 // val:value2
2                    Usr:value1 // val:value2
3                    Usr:value1 // val:value2
4                    Usr:value1 // val:value2

Due to these Value it cause these duplicate value. and keeping the both value is very important.

Upvotes: 2

Views: 37

Answers (1)

jezrael
jezrael

Reputation: 863166

IIUC use GroupBy.cumcount for helper column with pivot:

df1 = (df.assign(a=df.groupby('Person_ID').cumcount().add(1))
         .pivot('Person_ID','a','Person_value')
         .add_prefix('Person_value'))
print (df1)
a           Person_value1 Person_value2
Person_ID                            
A              usr:value1    val:value2
B              usr:value1    val:value2
C              usr:value1    val:value2
D              usr:value1    val:value2

Or aggregate join:

df1 = (df.groupby('Person_ID')
         .agg(' // '.join)
         .reset_index())
print (df1)
  Person_ID              Person_value
0           A  usr:value1 // val:value2
1           B  usr:value1 // val:value2
2           C  usr:value1 // val:value2
3           D  usr:value1 // val:value2

If is necessary split rows by Person_value before : before pivot:

df1 = (df.assign(a=df['Person_value'].str.split(':').str[0])
         .pivot('Person_ID','a','Person_value'))
print (df1)
a                   usr         val
Person_name                        
A            usr:value1  val:value2
B            usr:value1  val:value2
C            usr:value1  val:value2
D            usr:value1  val:value2

Upvotes: 3

Related Questions