Pandas data frame from csv. Columns with same name

Question

I have a csv with lot of columns (1314):

ColumnA   ColumnA   ColumnA   ColumnB   ColumnC   ColumnB   ColumnM
      5         9         5         1         6         8         9
      5         1         3         5         8         6         8

I would like to group by column summarizing the values, but when I try to get a data frame from this csv, the columns change their names to:

ColumnA   ColumnA.1   ColumnA.2   ColumnB   ColumnC   ColumnB.1   ColumnM
      5           9           5         1         6           8         9
      5           1           3         5         8           6         8

So I can't group by columns...

Is there any way to create a data frame from this csv keeping the name of the columns?

jezrael · Accepted Answer

Use Series.str.split with indexing by str:

df.columns = df.columns.str.split('.').str[0]
print (df)
   ColumnA  ColumnA  ColumnA  ColumnB  ColumnC  ColumnB  ColumnM
0        5        9        5        1        6        8        9
1        5        1        3        5        8        6        8

If want use groupby then not necessary remove them:

df = df.groupby(lambda x: x.split('.')[0], axis=1).sum()
print (df)
  ColumnA  ColumnB  ColumnC  ColumnM
0       19        9        6        9
1        9       11        8        8

Pandas data frame from csv. Columns with same name

Answers (2)

Related Questions