counting records on per hour, per day and create multindex DataFrame as output

Question

Sample DataFrame :

process_id | app_path | start_time

the desired output data frame should be multi-Indexed based on the date and time value in start_time column with unique dates as first level of index and one hour range as second level of index the count of records in each time slot should be calculated

def activity(self):
    # find unique dates from db file
    columns = self.df['start_time'].map(lambda x: x.date()).unique()

    result = pandas.DataFrame(np.zeros((1,len(columns))), columns = columns)
    for i in range(len(self.df)):
        col = self.df.iloc[i]['start_time'].date()
        result[col][0] = result.get_value(0, col) + 1

    return result

I have tried the above code which gives the output as :

15-07-2014 16-7-2014 17-07-2014 18-07-2014 3217 2114 1027 3016 I want to count records on per hour basis as well

JD Long · Accepted Answer

It would be helpful to start your question with some sample data. Since you didn't, I assumed the following is representative of your data (looks like app_path was not being used):

rng = pd.date_range('1/1/2011', periods=10000, freq='1Min')
df = pd.DataFrame(randint(size=len(rng), low=100, high = 500), index=rng)
df.columns = ['process_id']

It looks like you could benefit from exploring the groupby method in Pandas data frames. Using groupby, your example above become a simple one-liner:

df.groupby( [df.index.year, df.index.month, df.index.day] ).count()

and grouping by hour means simply adding hour to the group:

df.groupby( [df.index.year, df.index.month, df.index.day, df.index.hour] ).count()

Don't recreate the wheel in Pandas, use the methods provided for much more readable, as well as faster, code.

counting records on per hour, per day and create multindex DataFrame as output

Answers (1)

Related Questions