Reputation: 23
Sample DataFrame :
process_id | app_path | start_time
the desired output data frame should be multi-Indexed based on the date and time value in start_time column with unique dates as first level of index and one hour range as second level of index the count of records in each time slot should be calculated
def activity(self):
# find unique dates from db file
columns = self.df['start_time'].map(lambda x: x.date()).unique()
result = pandas.DataFrame(np.zeros((1,len(columns))), columns = columns)
for i in range(len(self.df)):
col = self.df.iloc[i]['start_time'].date()
result[col][0] = result.get_value(0, col) + 1
return result
I have tried the above code which gives the output as :
15-07-2014 16-7-2014 17-07-2014 18-07-2014
3217 2114 1027 3016
I want to count records on per hour basis as well
Upvotes: 2
Views: 6392
Reputation: 60756
It would be helpful to start your question with some sample data. Since you didn't, I assumed the following is representative of your data (looks like app_path
was not being used):
rng = pd.date_range('1/1/2011', periods=10000, freq='1Min')
df = pd.DataFrame(randint(size=len(rng), low=100, high = 500), index=rng)
df.columns = ['process_id']
It looks like you could benefit from exploring the groupby
method in Pandas data frames. Using groupby, your example above become a simple one-liner:
df.groupby( [df.index.year, df.index.month, df.index.day] ).count()
and grouping by hour means simply adding hour to the group:
df.groupby( [df.index.year, df.index.month, df.index.day, df.index.hour] ).count()
Don't recreate the wheel in Pandas, use the methods provided for much more readable, as well as faster, code.
Upvotes: 7