Reputation: 623
I have the following dataframe:
dates,values
2014-10-01 00:00,10.606
2014-10-01 01:00,10.595
2014-10-01 02:00,10.583
2014-10-01 03:00,10.572
2014-10-01 04:00,10.56
2014-10-01 05:00,10.564
2014-10-01 06:00,10.65
2014-10-01 07:00,10.801
2014-10-01 08:00,10.977
2014-10-01 09:00,11.316
2014-10-01 10:00,11.88
2014-10-01 11:00,12.427
2014-10-01 12:00,12.751
2014-10-01 13:00,12.863
2014-10-01 14:00,12.823
2014-10-01 15:00,12.686
2014-10-01 16:00,12.499
2014-10-01 17:00,12.293
2014-10-01 18:00,12.086
2014-10-01 19:00,11.89
2014-10-01 20:00,11.712
2014-10-01 21:00,11.552
2014-10-01 22:00,11.413
2014-10-01 23:00,11.292
2014-10-02 00:00,11.188
2014-10-02 01:00,11.1
Let's say that I want to select all the data related to a specific day. In this case. For example 2014-10-01. These are the operation being used in my code:
dfr = pd.read_csv(f_name, parse_dates=True,index_col=0,
infer_datetime_format=True)
yy = dfr [dfr.index.floor('D') == ' 2014-10-01 00:00:00'].to_numpy()
This is what I get:
array([[10.606],
[10.595],
[10.583],
[10.572],
[10.56 ],
[10.564],
[10.65 ],
[10.801],
[10.977],
[11.316],
[11.88 ],
[12.427],
[12.751],
[12.863],
[12.823],
[12.686],
[12.499],
[12.293],
[12.086],
[11.89 ],
[11.712],
[11.552],
[11.413],
[11.292]])
However, I would like to have yy in the following form:
array([10.606,10.595,10.583,10.572,10.56 ,10.564,10.65 ,10.801,10.977, 11.316,11.88 ,12.427,12.751,12.863,12.823,12.686,12.499,12.293,12.086,11.89 ,11.712,11.552,11.413,11.292])
Indeed I have to work with another vector xx which is:
xx=array([ 2.91833891, 2.84972246, 0.50386982, 5.35302713, 4.81822114,
3.33330121, 5.63819964, 11.20447123, 12.98512414, 9.95449998,
5.78945234, 9.90594599, 1.25708361, 3.02603884, 1.02683686,
3.84912813, 1.55641116, 13.04097404, 9.6277124 , 10.73849736,
5.39958019, 3.43633323, 13.5965677 , 7.31914519])
This would help me in using np.sum and so on without deal with cycle.
Thanks in advance
Upvotes: 0
Views: 45
Reputation: 120469
In fact what you need is a Series not a DataFrame:
squeeze=True
parameter to read csv:dfr = pd.read_csv(f_name, parse_dates=True,index_col=0,
infer_datetime_format=True, squeeze=True)
ravel
function:>>> dfr[dfr.index.floor('D') == ' 2014-10-01 00:00:00'].to_numpy().ravel()
array([10.606, 10.595, 10.583, 10.572, 10.56 , 10.564, 10.65 , 10.801,
10.977, 11.316, 11.88 , 12.427, 12.751, 12.863, 12.823, 12.686,
12.499, 12.293, 12.086, 11.89 , 11.712, 11.552, 11.413, 11.292])
Upvotes: 0
Reputation: 24314
use loc
:
yy=dfr.loc[dfr.index.floor('D') == ' 2014-10-01 00:00:00','values'].to_numpy()
OR
use flatten()
:
yy=dfr[dfr.index.floor('D') == ' 2014-10-01 00:00:00'].to_numpy().flatten()
#yy=dfr[dfr.index.floor('D') == ' 2014-10-01 00:00:00'].to_numpy().ravel()
Upvotes: 1
Reputation: 195528
Another solution, use df.loc
to select only one column:
yy = dfr.loc[
dfr.index.floor("D") == " 2014-10-01 00:00:00", "values"
].to_numpy()
print(yy)
Prints:
[10.606 10.595 10.583 10.572 10.56 10.564 10.65 10.801 10.977 11.316
11.88 12.427 12.751 12.863 12.823 12.686 12.499 12.293 12.086 11.89
11.712 11.552 11.413 11.292]
Upvotes: 0