Array of values in exact hour everyday(5 days)

I have an array, like:

                            0         1         2
date                                    
2000-06-01 10:00:00  0.040457  0.326594  0.492136 
2000-06-01 11:00:00  0.279323  0.877446  0.464523
2000-06-02 10:00:00  0.328068  0.837669  0.608559
2000-06-02 11:00:00  0.107959  0.678297  0.517435
2000-06-03 10:00:00  0.131555  0.418380  0.025725
2000-06-03 11:00:00  0.999961  0.619517  0.206108
2000-06-04 10:00:00  0.129270  0.024533  0.154769
2000-06-04 11:00:00  0.441010  0.741781  0.470402
2000-06-05 10:00:00  0.682101  0.375660  0.009916
2000-06-05 11:00:00  0.754488  0.352293  0.339337

I need to receive array of every values at every hour of everyday. It should be dataframe as

10:00:00  [[0.040457, 0.040457, 0.492136], [0.328068, 0.837669, 0.608559], ..., [0.682101, 0.375660, 0.009916]]
11:00:00  [[0.279323, 0.877446, 0.464523], [0.107959, 0.678297, 0.517435], ..., [0.754488, 0.352293, 0.339337]]

I tried this one:

locs = a.index.indexer_at_time('11:00:00')
a.iloc[locs]

But it doesn't show what I need. May be there is any way to do it via numpy? Important: loops shouldn't be used. if is also, if possible

Upvotes: 1

Views: 81

Answers (3)

greg_data
greg_data

Reputation: 2293

Assuming I've understood correctly, and what you want is an object containing one row for each distinct hour in the data, containing an array of all the data values for columns 0,1,2 as an array, for each row, the following will do this:

#get the hour as a column
x['hour'] = x.date.dt.hour

#groupby hour
#use apply to get values
#filter to only columns you want in the values:
by_hours = x.groupby('hour').apply(lambda x: x[['0','1','2']].values)

Which returns:

hour
10    [[0.040457, 0.326594, 0.492136], [0.328068, 0....
11    [[0.279323, 0.877446, 0.464523], [0.107959, 0....
dtype: object

Now you can access the values within each 'hour' as follows:

by_hours.loc[10]

returns:

array([[ 0.040457,  0.326594,  0.492136],
       [ 0.328068,  0.837669,  0.608559],
       [ 0.131555,  0.41838 ,  0.025725],
       [ 0.12927 ,  0.024533,  0.154769],
       [ 0.682101,  0.37566 ,  0.009916]])

Upvotes: 2

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can use pivot table to do that by creating a column of 'hours'

df['hour'] = df.reset_index()['date'].dt.hour.values
ndf = df.pivot_table(index=df.hour, columns=df.groupby(df['hour']).cumcount(),values=df[[0,1,2]])

Output:

            0                                                1            \
             0         1         2        3         4         0         1   
hour                                                                        
10    0.040457  0.328068  0.131555  0.12927  0.682101  0.326594  0.837669   
11    0.279323  0.107959  0.999961  0.44101  0.754488  0.877446  0.678297   

                                           2                                \
             2         3         4         0         1         2         3   
hour                                                                         
10    0.418380  0.024533  0.375660  0.492136  0.608559  0.025725  0.154769   
11    0.619517  0.741781  0.352293  0.464523  0.517435  0.206108  0.470402   


             4  
hour            
10    0.009916  
11    0.339337  

You can also use .loc to get the speicfic hours data i.e

ndf.loc[10]

Output :

0  0    0.040457
   1    0.328068
   2    0.131555
   3    0.129270
   4    0.682101
1  0    0.326594
   1    0.837669
   2    0.418380
   3    0.024533
   4    0.375660
2  0    0.492136
   1    0.608559
   2    0.025725
   3    0.154769
   4    0.009916
Name: 10, dtype: float64

Upvotes: 2

Chankey Pathak
Chankey Pathak

Reputation: 21676

Apply a function to the dataframe which will filter the values by comparing the time.

output = []


def filter_data(row):
    values = [row[0], row[1], row[2]]
    if row.date.split()[1] == '10:00:00':
        output.append(values)

df.apply(filter_data, axis=1)

print (output)

Upvotes: 0

Related Questions