diedro
diedro

Reputation: 623

dealing with numpy array and dataframe columns

I have the following dataframe:

dates,values
2014-10-01 00:00,10.606
2014-10-01 01:00,10.595
2014-10-01 02:00,10.583
2014-10-01 03:00,10.572
2014-10-01 04:00,10.56
2014-10-01 05:00,10.564
2014-10-01 06:00,10.65
2014-10-01 07:00,10.801
2014-10-01 08:00,10.977
2014-10-01 09:00,11.316
2014-10-01 10:00,11.88
2014-10-01 11:00,12.427
2014-10-01 12:00,12.751
2014-10-01 13:00,12.863
2014-10-01 14:00,12.823
2014-10-01 15:00,12.686
2014-10-01 16:00,12.499
2014-10-01 17:00,12.293
2014-10-01 18:00,12.086
2014-10-01 19:00,11.89
2014-10-01 20:00,11.712
2014-10-01 21:00,11.552
2014-10-01 22:00,11.413
2014-10-01 23:00,11.292
2014-10-02 00:00,11.188
2014-10-02 01:00,11.1

Let's say that I want to select all the data related to a specific day. In this case. For example 2014-10-01. These are the operation being used in my code:

dfr       =  pd.read_csv(f_name, parse_dates=True,index_col=0,
                                       infer_datetime_format=True)

yy  = dfr [dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy()

This is what I get:

array([[10.606],
       [10.595],
       [10.583],
       [10.572],
       [10.56 ],
       [10.564],
       [10.65 ],
       [10.801],
       [10.977],
       [11.316],
       [11.88 ],
       [12.427],
       [12.751],
       [12.863],
       [12.823],
       [12.686],
       [12.499],
       [12.293],
       [12.086],
       [11.89 ],
       [11.712],
       [11.552],
       [11.413],
       [11.292]])

However, I would like to have yy in the following form:

array([10.606,10.595,10.583,10.572,10.56 ,10.564,10.65 ,10.801,10.977, 11.316,11.88 ,12.427,12.751,12.863,12.823,12.686,12.499,12.293,12.086,11.89 ,11.712,11.552,11.413,11.292])

Indeed I have to work with another vector xx which is:

xx=array([ 2.91833891,  2.84972246,  0.50386982,  5.35302713,  4.81822114,
        3.33330121,  5.63819964, 11.20447123, 12.98512414,  9.95449998,
        5.78945234,  9.90594599,  1.25708361,  3.02603884,  1.02683686,
        3.84912813,  1.55641116, 13.04097404,  9.6277124 , 10.73849736,
        5.39958019,  3.43633323, 13.5965677 ,  7.31914519])

This would help me in using np.sum and so on without deal with cycle.

Thanks in advance

Upvotes: 0

Views: 45

Answers (3)

Corralien
Corralien

Reputation: 120469

In fact what you need is a Series not a DataFrame:

  1. At file level, use squeeze=True parameter to read csv:
dfr = pd.read_csv(f_name, parse_dates=True,index_col=0,
                  infer_datetime_format=True, squeeze=True)
  1. Use numpy ravel function:
>>> dfr[dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy().ravel()

array([10.606, 10.595, 10.583, 10.572, 10.56 , 10.564, 10.65 , 10.801,
       10.977, 11.316, 11.88 , 12.427, 12.751, 12.863, 12.823, 12.686,
       12.499, 12.293, 12.086, 11.89 , 11.712, 11.552, 11.413, 11.292])
  1. Use one of solutions proposed by @AnuragDabas or @AndrejKesely.

Upvotes: 0

Anurag Dabas
Anurag Dabas

Reputation: 24314

use loc:

yy=dfr.loc[dfr.index.floor('D')  == ' 2014-10-01 00:00:00','values'].to_numpy()

OR

use flatten():

yy=dfr[dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy().flatten()
#yy=dfr[dfr.index.floor('D')  == ' 2014-10-01 00:00:00'].to_numpy().ravel()

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195528

Another solution, use df.loc to select only one column:

yy = dfr.loc[
    dfr.index.floor("D") == " 2014-10-01 00:00:00", "values"
].to_numpy()
print(yy)

Prints:

[10.606 10.595 10.583 10.572 10.56  10.564 10.65  10.801 10.977 11.316
 11.88  12.427 12.751 12.863 12.823 12.686 12.499 12.293 12.086 11.89
 11.712 11.552 11.413 11.292]

Upvotes: 0

Related Questions