Reputation: 21
I have a csv file where every column name exists three times. That is, for every column header/name found in the file there are sets of three different columns with the same name. Below is a simplified version of what this file has.
X Y Z X Y Z X Y Z
0 1 1 1 3 3 3 5 5 5
1 2 2 2 4 4 4 6 6 6
Essentially, what I'm trying to do is to import this file but also combine the columns with the same name together, so it looks something like:
X Y Z
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
The main trouble I'm having is when I import this file using df = pd.read_csv(file)
, it automatically adds suffixes to the duplicated column names, making it so that I can't use some other potential fixes I've seen online (mainly having to do with groupby). I also tried seeing if I can do something with setting the argument df = pd.read_csv(file, header=None)
, but I get lost immediately once I have the headerless dataframe.
Can someone please help me out with a good way to do this?
Upvotes: 2
Views: 126
Reputation: 5648
depending on how the suffix is added, you can do this to rename them
new_cols = []
for col in df.columns:
new_cols.append(col.split('_')[0])
df.columns = new_cols
In the split() method just use what read_excel is giving (could be a . or _ or something else)
Upvotes: 0
Reputation: 323326
Try with melt
then add additional column with cumcount
then pivot
back
s = df.melt()
s['v'] = s.groupby('variable').cumcount()
out = s.pivot('v','variable','value')
out
Out[75]:
variable X Y Z
v
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
Upvotes: 1