Reputation: 1553
I wanted to calculate the percent of some object in one hour ('Time') so I have tried to write a lambda function, and I think it does the job, but index columns disappeared, columns that dataframe is grouped by.
df = df.groupby(['id', 'name', 'time', 'object', 'type'], as_index=True, sort=False)['col1', 'col2', 'col3', 'col4', 'col5'].apply(lambda x: x * 100 / 3600).reset_index()
After that code I print df.columns
and got this:
Index([u'index', u'col1', col2', u'col3',
u'col4', u'col5'],
dtype='object')
If there is a need I am going to write some table with values for each column. Thanks in advance.
Upvotes: 3
Views: 1404
Reputation: 323306
Data from Jpp
df[['col1','col2']]*=100/3600
df
Out[110]:
col1 col2 id name
0 0.138889 0.250000 1 A
1 0.166667 0.111111 2 B
2 0.222222 0.138889 1 A
Upvotes: 1
Reputation: 76307
Moving the loop outward, will make the code run significantly faster:
for c in ['col1', 'col2', 'col3', 'col4', 'col5']:
df[c] *= 100. / 3600
This is because the individual loops' calculations will be done in a vectorized way.
This also won't modify the index in any way.
Upvotes: 3
Reputation: 5774
You apply .reset_index()
which resets the index. Take a look at the pandas
documentation and you'll see, that .reset_index()
transfers the index to the columns.
Upvotes: 1
Reputation: 164693
pd.DataFrame.groupby
is used to aggregate data, not to apply a function to multiple columns.
For simple functions, you should look for a vectorised solution. For example:
# set up simple dataframe
df = pd.DataFrame({'id': [1, 2, 1], 'name': ['A', 'B', 'A'],
'col1': [5, 6, 8], 'col2': [9, 4, 5]})
# apply logic in a vectorised way on multiple columns
df[['col1', 'col2']] = df[['col1', 'col2']].values * 100 / 3600
If you wish to set your index as multiple columns, and are keen to use pd.DataFrame.apply
, this is possible as two separate steps. For example:
df = df.set_index(['id', 'name'])
df[['col1', 'col2']] = df[['col1', 'col2']].apply(lambda x: x * 100 / 3600)
Upvotes: 2