Reputation: 445
I have a csv with lot of columns (1314):
ColumnA ColumnA ColumnA ColumnB ColumnC ColumnB ColumnM
5 9 5 1 6 8 9
5 1 3 5 8 6 8
I would like to group by column summarizing the values, but when I try to get a data frame from this csv, the columns change their names to:
ColumnA ColumnA.1 ColumnA.2 ColumnB ColumnC ColumnB.1 ColumnM
5 9 5 1 6 8 9
5 1 3 5 8 6 8
So I can't group by columns...
Is there any way to create a data frame from this csv keeping the name of the columns?
Upvotes: 0
Views: 441
Reputation: 863256
Use Series.str.split
with indexing by str
:
df.columns = df.columns.str.split('.').str[0]
print (df)
ColumnA ColumnA ColumnA ColumnB ColumnC ColumnB ColumnM
0 5 9 5 1 6 8 9
1 5 1 3 5 8 6 8
If want use groupby then not necessary remove them:
df = df.groupby(lambda x: x.split('.')[0], axis=1).sum()
print (df)
ColumnA ColumnB ColumnC ColumnM
0 19 9 6 9
1 9 11 8 8
Upvotes: 2
Reputation: 150785
If you actual column name does not contain .
, you can change the column name back with:
df = pd.read_csv(path_to_csv)
df.columns = df.columns.str.extract('^([^\.]*)')[0].values
Output:
ColumnA ColumnA ColumnA ColumnB ColumnC ColumnB ColumnM
0 5 9 5 1 6 8 9
1 5 1 3 5 8 6 8
Upvotes: 1