Reputation: 312
I am trying to loop through multiple excel files in pandas. The structure of the files are very much similar, the first 10 column forms a key and rest of the columns have the values. I want to group by first 10 columns and sum the rest.
I have searched and found solutions online for similar cases but my problem is that
I have large number of columns with values ( to be aggregate as sum) and
Number / names of columns(with values) is different for each file(dataframe)
#Key columns are same across all the files.
I can't share the actual data sample but here is the format sample of the file structure
and here is the desired output from the above data
It is like a groupby operation but with uncertain large number of columns and header name makes it difficult to use groupby or pivot. Can Any one suggest me what is the best possible solution for it in python.
Edited:
df.groupby(list(df.columns[:11])).agg(sum)
is working but for some reason it is taking 25-30 mins. the same thing MS Access is done in 1-2 mins . Am I doing something wrong here or is there any other way to do it in python itself
Upvotes: 3
Views: 1926
Reputation: 8898
Just use df.columns
which has the list of columns, you can then use a slice on that list to get the 10 leftmost columns.
This should work:
df.groupby(df.columns[:10].to_list()).sum()
Upvotes: 1