tav19
tav19

Reputation: 21

How can I import a csv file in a way such that columns with the same name are combined (concatenated)?

I have a csv file where every column name exists three times. That is, for every column header/name found in the file there are sets of three different columns with the same name. Below is a simplified version of what this file has.

   X  Y  Z  X  Y  Z  X  Y  Z
0  1  1  1  3  3  3  5  5  5  
1  2  2  2  4  4  4  6  6  6

Essentially, what I'm trying to do is to import this file but also combine the columns with the same name together, so it looks something like:

   X  Y  Z
0  1  1  1  
1  2  2  2
2  3  3  3
3  4  4  4
4  5  5  5
5  6  6  6

The main trouble I'm having is when I import this file using df = pd.read_csv(file), it automatically adds suffixes to the duplicated column names, making it so that I can't use some other potential fixes I've seen online (mainly having to do with groupby). I also tried seeing if I can do something with setting the argument df = pd.read_csv(file, header=None), but I get lost immediately once I have the headerless dataframe.

Can someone please help me out with a good way to do this?

Upvotes: 2

Views: 126

Answers (2)

Jonathan Leon
Jonathan Leon

Reputation: 5648

depending on how the suffix is added, you can do this to rename them

new_cols = []
for col in df.columns:
    new_cols.append(col.split('_')[0])
df.columns = new_cols

In the split() method just use what read_excel is giving (could be a . or _ or something else)

Upvotes: 0

BENY
BENY

Reputation: 323326

Try with melt then add additional column with cumcount then pivot back

s = df.melt()
s['v'] = s.groupby('variable').cumcount()
out = s.pivot('v','variable','value')
out
Out[75]: 
variable  X  Y  Z
v                
0         1  1  1
1         2  2  2
2         3  3  3
3         4  4  4
4         5  5  5
5         6  6  6

Upvotes: 1

Related Questions