Reputation: 32326
How do I group by data frame based on first column after splitting data on semi colon? In this example I need to split on last column time and group by hour.
from StringIO import StringIO
myst="""india, 905034 , 19:44
USA, 905094 , 19:33
Russia, 905154 , 21:56
"""
u_cols=['country', 'index', 'current_tm']
myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep=',', names = u_cols)
This query does not return the expected results:
df[df['index'] > 900000].groupby([df.current_tm]).size()
current_tm
21:56 1
19:33 1
19:44 1
dtype: int64
It should be :
21 1
19 2
The time is in hh:mm format but pandas consider it as string.
Is there any utility that will convert the SQL query to pandas equivalent? (something like querymongo.com that will help mongoDB users)
Upvotes: 1
Views: 343
Reputation: 109546
You can add the hour to your dataframe as follows and then use it for grouping:
df['hour'] = df.current_tm.str.strip().apply(lambda x: x.split(':')[0] if isinstance(x, str)
else None)
>>> df[df['index'] > 900000].groupby('hour').size()
hour
19 2
21 1
dtype: int64
Upvotes: 1
Reputation: 16251
Create a new column:
df['hour'] = [current_time.split(':')[0] for current_time in df['current_tm']]
Then apply your method:
df[df['index'] > 900000].groupby([df['hour']]).size()
hour
19 2
21 1
dtype: int64
Upvotes: 1