Reputation: 61

How can I remove duplicate cells of any row in pandas DataFrame?

I need to update a pandas DataFrame as below. Is it possible by any means? [I highly appreciate all of your time and endeavors. Sorry that my question arose confusion among you. I tried to update the question. Thanks again]

Sample1:

import pandas as pd    
#original dataframe
data = {'row_1': ['x','y','x','y'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)

#desired dataframe from data
data1 = {'row_1': ['x','y'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)

Sample 2:

import pandas as pd    
#original dataframe
data = {'row_1': ['x','y','p','x'], 'row_2': ['a', 'b', 'a', None]}
data=pd.DataFrame.from_dict(data, orient='index')
print(data)

#desired dataframe from data
data1 = {'row_1': ['x','y','p'], 'row_2': ['a', 'b']}
data1=pd.DataFrame.from_dict(data1, orient='index')
print(data1)

Upvotes: 0

Answers (3)

Zero

Reputation: 1899

You can just do this,

data = data.T.loc[data.T["row_1"].drop_duplicates().index, :].T

Output -

	0	1
row_1	x	y
row_2	a	b

Upvotes: 0

user2736738

Reputation: 30926

data = data.apply(lambda x: x.transpose().dropna().unique().transpose(), axis=1)

This is what you are looking for. Use dropna to get rid of NaN's and then only keep the unique elements. Apply this logic to each row of the dataframe to get the desired result.

Upvotes: 1

memo

Reputation: 177

you can use duplicated method. checkout this link for an example on pandas' API reference

Upvotes: 0

How can I remove duplicate cells of any row in pandas DataFrame?

Answers (3)

Related Questions