Reputation: 1094
I have a data set with:
Example :
Timestamp Value at hh:00 hh:10 hh:20 hh:30 hh:40 hh:50
2018-07-18 00:00:00 100 101 102 103 104 105
2018-07-18 01:00:00 106 107 108 109 110 111
I would like to convert the horizontal data into normal vertical data with a complete timestamp, including the minutes. What is the best way ?
Expected result : (see the modified timestamp with + 10 minutes each row)
datetime value
2018-07-18 00:00:00 100
2018-07-18 00:10:00 101
2018-07-18 00:20:00 102
2018-07-18 00:30:00 103
2018-07-18 00:40:00 104
2018-07-18 00:50:00 105
2018-07-18 01:00:00 106
2018-07-18 01:10:00 107
2018-07-18 01:20:00 108
2018-07-18 01:30:00 109
2018-07-18 01:40:00 110
2018-07-18 01:50:00 111
I have tried rep() but I can't figure out how to access the value without a for loop and manually adressing the values by their indices (i, j) .
I am trying to solve this problem by using correctly the R language (i.e. no "for" loop)
Upvotes: 1
Views: 170
Reputation: 39657
In base you can try a combination of lapply
and seq
to get the datetime, which simply needs to be combined with the values by using a dataframe
like:
data.frame(datetime=do.call(c, lapply(x[,1], seq, by="10 min", length.out = 6)), value=c(t(x[,-1])))
# datetime value
#1 2018-07-18 00:00:00 100
#2 2018-07-18 00:10:00 101
#3 2018-07-18 00:20:00 102
#4 2018-07-18 00:30:00 103
#5 2018-07-18 00:40:00 104
#6 2018-07-18 00:50:00 105
#7 2018-07-18 01:00:00 106
#8 2018-07-18 01:10:00 107
#9 2018-07-18 01:20:00 108
#10 2018-07-18 01:30:00 109
#11 2018-07-18 01:40:00 110
#12 2018-07-18 01:50:00 111
or an alternative way:
data.frame(datetime=rep(x[,1], each=6) + as.difftime(0:5*10, units = "mins"), value=as.vector(t(x[,-1])))
In case you want to use the column-names as time-difference you can use stack
with a different order of the output.
with(stack(x[,-1]), data.frame(datetime=x[,1] + as.difftime(as.numeric(ind), units = "mins"), values))
Data:
x <- read.csv(check.names=FALSE, text="Timestamp,0,10,20,30,40,50
2018-07-18 00:00:00,100,101,102,103,104,105
2018-07-18 01:00:00,106,107,108,109,110,111")
x[,1] <- as.POSIXct(x[,1])
Upvotes: 1
Reputation: 1630
Looks like what you wan is transform your data from a wide format to a long format. Install again tidyr library and use the new function pivot_longer()
See THIS
Upvotes: 1
Reputation: 20463
Here's one way to wrangle this with dplyr
, tidyr
, and lubridate
:
library(tidyverse)
library(lubridate)
df <- structure(list(Timestamp = structure(1:2, .Label = c("7/18/2018 0:00", "7/18/2018 1:00"), class = "factor"), Value.at.hh.00 = c(100L, 106L), hh.10 = c(101L, 107L), hh.20 = c(102L, 108L), hh.30 = c(103L, 109L), hh.40 = c(104L, 110L), hh.50 = c(105L, 111L)), class = "data.frame", row.names = c(NA, -2L))
df %>%
gather(time, value, -Timestamp) %>%
mutate(
time = str_remove(time, "(Value.at.)?hh."),
Timestamp = mdy_hm(Timestamp),
minutes = dminutes(as.numeric(time)),
datetime = Timestamp + seconds
) %>%
select(
datetime,
value
)
# datetime value
# 1 2018-07-18 00:00:00 100
# 2 2018-07-18 01:00:00 106
# 3 2018-07-18 00:10:00 101
# 4 2018-07-18 01:10:00 107
# 5 2018-07-18 00:20:00 102
# 6 2018-07-18 01:20:00 108
# 7 2018-07-18 00:30:00 103
# 8 2018-07-18 01:30:00 109
# 9 2018-07-18 00:40:00 104
# 10 2018-07-18 01:40:00 110
# 11 2018-07-18 00:50:00 105
# 12 2018-07-18 01:50:00 111
First, make your data "long" using gather()
(which as of tidyr
v 1.0.0 will become pivot_longer()
). Next, we do some cleanup and type conversion. We also minutes from our newly minted time
column. Finally, we do some date arithmetic to get a datetime
column.
N.B. If you are already on tidyr
1.0.0, gather(...)
could become pivot_longer(cols = -Timestamp, names_to = "time", values_to = "value")
Upvotes: 1