Reputation: 87
I have an array, like:
0 1 2
date
2000-06-01 10:00:00 0.040457 0.326594 0.492136
2000-06-01 11:00:00 0.279323 0.877446 0.464523
2000-06-02 10:00:00 0.328068 0.837669 0.608559
2000-06-02 11:00:00 0.107959 0.678297 0.517435
2000-06-03 10:00:00 0.131555 0.418380 0.025725
2000-06-03 11:00:00 0.999961 0.619517 0.206108
2000-06-04 10:00:00 0.129270 0.024533 0.154769
2000-06-04 11:00:00 0.441010 0.741781 0.470402
2000-06-05 10:00:00 0.682101 0.375660 0.009916
2000-06-05 11:00:00 0.754488 0.352293 0.339337
I need to receive array of every values at every hour of everyday. It should be dataframe as
10:00:00 [[0.040457, 0.040457, 0.492136], [0.328068, 0.837669, 0.608559], ..., [0.682101, 0.375660, 0.009916]]
11:00:00 [[0.279323, 0.877446, 0.464523], [0.107959, 0.678297, 0.517435], ..., [0.754488, 0.352293, 0.339337]]
I tried this one:
locs = a.index.indexer_at_time('11:00:00')
a.iloc[locs]
But it doesn't show what I need. May be there is any way to do it via numpy? Important: loops shouldn't be used. if is also, if possible
Upvotes: 1
Views: 81
Reputation: 2293
Assuming I've understood correctly, and what you want is an object containing one row for each distinct hour in the data, containing an array of all the data values for columns 0,1,2 as an array, for each row, the following will do this:
#get the hour as a column
x['hour'] = x.date.dt.hour
#groupby hour
#use apply to get values
#filter to only columns you want in the values:
by_hours = x.groupby('hour').apply(lambda x: x[['0','1','2']].values)
Which returns:
hour
10 [[0.040457, 0.326594, 0.492136], [0.328068, 0....
11 [[0.279323, 0.877446, 0.464523], [0.107959, 0....
dtype: object
Now you can access the values within each 'hour' as follows:
by_hours.loc[10]
returns:
array([[ 0.040457, 0.326594, 0.492136],
[ 0.328068, 0.837669, 0.608559],
[ 0.131555, 0.41838 , 0.025725],
[ 0.12927 , 0.024533, 0.154769],
[ 0.682101, 0.37566 , 0.009916]])
Upvotes: 2
Reputation: 30605
You can use pivot table to do that by creating a column of 'hours'
df['hour'] = df.reset_index()['date'].dt.hour.values
ndf = df.pivot_table(index=df.hour, columns=df.groupby(df['hour']).cumcount(),values=df[[0,1,2]])
Output:
0 1 \ 0 1 2 3 4 0 1 hour 10 0.040457 0.328068 0.131555 0.12927 0.682101 0.326594 0.837669 11 0.279323 0.107959 0.999961 0.44101 0.754488 0.877446 0.678297 2 \ 2 3 4 0 1 2 3 hour 10 0.418380 0.024533 0.375660 0.492136 0.608559 0.025725 0.154769 11 0.619517 0.741781 0.352293 0.464523 0.517435 0.206108 0.470402 4 hour 10 0.009916 11 0.339337
You can also use .loc to get the speicfic hours data i.e
ndf.loc[10]
Output :
0 0 0.040457 1 0.328068 2 0.131555 3 0.129270 4 0.682101 1 0 0.326594 1 0.837669 2 0.418380 3 0.024533 4 0.375660 2 0 0.492136 1 0.608559 2 0.025725 3 0.154769 4 0.009916 Name: 10, dtype: float64
Upvotes: 2
Reputation: 21676
Apply a function to the dataframe which will filter the values by comparing the time.
output = []
def filter_data(row):
values = [row[0], row[1], row[2]]
if row.date.split()[1] == '10:00:00':
output.append(values)
df.apply(filter_data, axis=1)
print (output)
Upvotes: 0