Reputation: 383
So I have a DataFrame that has multiple column names that are the same. I firstly separate these columns and create a DataFrame that has these duplicated column names. What I was wondering, is there anyway to double check if all of the values in the rows have the same value as each other? In which case, I know that one column is an exact duplicate of another and I can drop it?
Below is what I have gotten so far and an example DF:
import collections
duplicated_columns = [item for item, count in collections.Counter(checking_empty_list).items() if count > 1]
dupe[duplicated_columns].head()
That is a simple way to obtain the columns that appear twice in a dataframe and then I am just displaying the head. I get the following df below:
In this case, ID would be all true and we can delete one of the columns, however total wouldn't be true and we won't delete any as the values are different. I thought of doing but it is unable to differentiate. How would I do it for multiple duplicated columns which could differ each time?
if df['ID'] == df['ID'].all() == True:
df.drop(['ID'])
else:
pass
Here is the code to get it into a df in Python.
ID,ID,TOTAL,TOTAL
0.0,0.0,290,0.00
0.0,0.0,16,60.16
0.0,0.0,0,0.00
0.0,0.0,87,0.87
0.0,0.0,78,0.78
df = pd.read_clipboard(sep=',')
df = df.rename(columns = {'TOTAL.1':'TOTAL', 'ID.1':'ID'})
Any help would be appreciated.
Upvotes: 1
Views: 1169
Reputation: 66
Ok, as the transpose alternative didn't work. I've tried your code.
to use all() you should delimit the expression with (); the drop method is not in place and the axis should be equals 1 (column).
if (df.loc[:,'ID'] == df.loc[:,"ID.1"]).all():
df.drop('ID.1', axis= 1,inplace=True)
else:
pass
df
Out[42]:
ID TOTAL TOTAL.1
0 0 290 0.00
1 0 16 60.16
2 0 0 0.00
3 0 87 0.87
4 0 78 0.78
Upvotes: 1
Reputation: 66
You can transpose the DataFrame and then use drop_duplicates.
df.T.drop_duplicates()
Out[6]:
0 1 2 3 4
ID 0.0 0.00 0.0 0.00 0.00
TOTAL 290.0 16.00 0.0 87.00 78.00
TOTAL 0.0 60.16 0.0 0.87 0.78
To get the desired table, just transpose it back:
new_df = df.T.drop_duplicates()
new_df.T
Out[8]:
ID TOTAL TOTAL
0 0.0 290.0 0.00
1 0.0 16.0 60.16
2 0.0 0.0 0.00
3 0.0 87.0 0.87
4 0.0 78.0 0.78
Upvotes: 0