Nhyi
Nhyi

Reputation: 383

Checking If Identical Column Names Have Identical Values in Pandas

So I have a DataFrame that has multiple column names that are the same. I firstly separate these columns and create a DataFrame that has these duplicated column names. What I was wondering, is there anyway to double check if all of the values in the rows have the same value as each other? In which case, I know that one column is an exact duplicate of another and I can drop it?

Below is what I have gotten so far and an example DF:

import collections
duplicated_columns = [item for item, count in collections.Counter(checking_empty_list).items() if count > 1]

dupe[duplicated_columns].head()

That is a simple way to obtain the columns that appear twice in a dataframe and then I am just displaying the head. I get the following df below:

enter image description here

In this case, ID would be all true and we can delete one of the columns, however total wouldn't be true and we won't delete any as the values are different. I thought of doing but it is unable to differentiate. How would I do it for multiple duplicated columns which could differ each time?

if df['ID'] == df['ID'].all() == True:
    df.drop(['ID'])
else:
    pass

Here is the code to get it into a df in Python.

ID,ID,TOTAL,TOTAL
0.0,0.0,290,0.00
0.0,0.0,16,60.16
0.0,0.0,0,0.00
0.0,0.0,87,0.87
0.0,0.0,78,0.78
df = pd.read_clipboard(sep=',')
df = df.rename(columns = {'TOTAL.1':'TOTAL', 'ID.1':'ID'})

Any help would be appreciated.

Upvotes: 1

Views: 1169

Answers (2)

Augusto Quaglia
Augusto Quaglia

Reputation: 66

Ok, as the transpose alternative didn't work. I've tried your code.

to use all() you should delimit the expression with (); the drop method is not in place and the axis should be equals 1 (column).

if (df.loc[:,'ID'] == df.loc[:,"ID.1"]).all():
    df.drop('ID.1', axis= 1,inplace=True)
else:
    pass

df
Out[42]: 
   ID  TOTAL  TOTAL.1
0   0    290     0.00
1   0     16    60.16
2   0      0     0.00
3   0     87     0.87
4   0     78     0.78

Upvotes: 1

Augusto Quaglia
Augusto Quaglia

Reputation: 66

You can transpose the DataFrame and then use drop_duplicates.

df.T.drop_duplicates()
Out[6]: 
           0      1    2      3      4
ID       0.0   0.00  0.0   0.00   0.00
TOTAL  290.0  16.00  0.0  87.00  78.00
TOTAL    0.0  60.16  0.0   0.87   0.78

To get the desired table, just transpose it back:

new_df = df.T.drop_duplicates() 
new_df.T
Out[8]: 
    ID  TOTAL  TOTAL
0  0.0  290.0   0.00
1  0.0   16.0  60.16
2  0.0    0.0   0.00
3  0.0   87.0   0.87
4  0.0   78.0   0.78

Upvotes: 0

Related Questions