user5606009
user5606009

Reputation:

how to iterate based on a condition, and assign aggregated value to a row in new dataframe in R

I have a large dataset of stock prices with 203615 rows and 2 columns(price and Timestamp). in below format

price(USD) | Timestamp

3.5 | 2014-01-01 20:00:00

2 | 2014-01-01 20:15:00

5 | 2014-01-01 20:15:00

----


4 | 2014-01-31 23:00:00

5 | 2014-01-31 23:00:00

4.5 | 2014-01-31 23:00:00

203615 2.3 | 2014-01-31 23:00:00

Time stamp varies from "2014-01-01 20:00:00" to "2014-01-31 23:00:00" with intervals of 15min(rounded to 15min). i have several transactions on same timestamp. I have to group rows based on timestamp with difference of one day, and caluclate min,max and mean of the price and no of rows within the timestamp limits and assign them to a row in a new dataframe for every iteration until it reaches the end timestamp("2014-01-31 23:00:00") from starting date('2014-01-02 20:00:00") note: iteration has to be done for every 15min

i have tried while loop. please help me with this and suggest me if i can use any packages

Upvotes: 0

Views: 49

Answers (1)

sconfluentus
sconfluentus

Reputation: 4993

This is my own code which I used as a way of creating a window of time (the prior 24 hours) to iterate over and create min and max values for a project I am working on... inter is the inteval I worked on in the loop raw is the data frame name i is the specific row from which the datetime column was selected from raw

I started my intervals at 97th row ( (i in 97:nrow(raw) ) because the stamps were taken at 15 minute intervals and I wanted a 24 hour backward window, so I needed to leave 96 intervals to pull from...I could not reach back into time I had no data for...so I started far enough into my data to leave room for those intervals.

for (i in 97:nrow(raw)){ inter=raw$datetime[i] - as.difftime(24, unit='hours') raw$deltaAirTemp_24[i] <-max(temp$Air.Temperature)- min(temp$Air.Temperature) }

The key is getting into a real date time format. Run str() on the field with the dates, if the come back as anything but Factor, use:

as.POSIXct(yourdate$field, %Y-%m-%d %H:%M:%S)

If they come back from str(yourdatecolumn here) as FACTOR then wrap it in as.POSIXct(as.character(yourdate$field), %Y-%m-%d %H:%M:%S) to be sure it does not coerce the date into a Level number then time..

Get them into a consistent date format, then construct something like above to extract the periods you need. difftime is in the base package and works well you can use positive and negative intervals with it. I hope his helps!

Upvotes: 0

Related Questions