suptagni
suptagni

Reputation: 61

How can I remove duplicate cells of any row in pandas DataFrame?

I need to update a pandas DataFrame as below. Is it possible by any means? [I highly appreciate all of your time and endeavors. Sorry that my question arose confusion among you. I tried to update the question. Thanks again]

Sample1:

import pandas as pd    
#original dataframe
data = {'row_1': ['x','y','x','y'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)

#desired dataframe from data
data1 = {'row_1': ['x','y'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)

Sample 2:

import pandas as pd    
#original dataframe
data = {'row_1': ['x','y','p','x'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)

#desired dataframe from data
data1 = {'row_1': ['x','y','p'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)

Upvotes: 0

Views: 584

Answers (3)

Zero
Zero

Reputation: 1899

You can just do this,

data = data.T.loc[data.T["row_1"].drop_duplicates().index, :].T

Output -

0 1
row_1 x y
row_2 a b

Upvotes: 0

user2736738
user2736738

Reputation: 30926

data = data.apply(lambda x: x.transpose().dropna().unique().transpose(), axis=1)

This is what you are looking for. Use dropna to get rid of NaN's and then only keep the unique elements. Apply this logic to each row of the dataframe to get the desired result.

Upvotes: 1

memo
memo

Reputation: 177

you can use duplicated method. checkout this link for an example on pandas' API reference

Upvotes: 0

Related Questions