Reputation: 402
I have a flat-csv file containing data in long format, that needs to be converted to a time series object. The format of the file looks like this:
DATE ID REGION VALUE
2016-03-10 10 DE001 2332,23
2016-03-10 10 DE001 2332,23
2016-03-10 10 DE002 2332,23
2016-03-10 11 DE001 2332,23
2016-03-10 11 DE002 2332,23
2016-03-10 12 DE001 2332,23
2016-03-11 10 DE001 2332,23
2016-03-11 10 DE001 2332,23
2016-03-11 10 DE002 2332,23
2016-03-11 11 DE001 2332,23
2016-03-11 11 DE002 2332,23
2016-03-11 12 DE001 2332,23
I want to group by ID and then by region, so that i have a different time-series for each ID-group containing several region observations for the complete available time-span.
Upvotes: 1
Views: 89
Reputation: 756
I misunderstood the OP's question.
You can use tapply to break up the original data frame (call it D). This is a bit tricky. You can't easily change D in the tapply
D$relTime <- NA
L=tapply(1:nrow(D),D$ID, function(x) {
# x contains the row numbers for each ID
RT <- data.frame(row=x)
T0 <- D$DATE[x][1]
RT$val <- D$DATE[x]-T0 # if time series means offset from a base time
RT
})
DL <- do.call('rbind',L)
# assuming you want it in D
D$relTime[DL$row] <- DL$val
This will create a new column which contains the offset from the base time for each ID.
Edit: I use '=' for assingment which isn't considered best practice. I've changed them in the above.
Upvotes: 1
Reputation: 756
You can use the as.Date function. Load the table using read.table("filename.csv"). The dates will be loaded as factors unless you specify stringsAsFactors=FALSE in the read.table call. However, doing so will apply to all character columns.
so,
D <- read.table("file.csv")
D$DATE <- as.Date(as.character(D$DATE), "%Y-%m-%d")
should do the trick. The as.character will ensure that the dates are passed as strings to as.Date even if they have been loaded as factors
More info:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.Date.html
https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
Technically, it's not a csv file because the "c" in "csv" means "comma". Your separators are spaces. But you can still use the read.csv call if you specify sep=' ' in the call.
Upvotes: 0