Reputation: 1094

What is the best way to convert several data on one line into multiple lines?

I have a data set with:

Each row represents one hour
6 data columns representing the value for each 10 minutes interval of that hour.

Example :

Timestamp              Value at hh:00   hh:10   hh:20   hh:30   hh:40  hh:50 
2018-07-18 00:00:00            100      101     102     103     104    105
2018-07-18 01:00:00            106      107     108     109     110    111

I would like to convert the horizontal data into normal vertical data with a complete timestamp, including the minutes. What is the best way ?

Expected result : (see the modified timestamp with + 10 minutes each row)

datetime                       value
2018-07-18 00:00:00            100 
2018-07-18 00:10:00            101
2018-07-18 00:20:00            102
2018-07-18 00:30:00            103
2018-07-18 00:40:00            104
2018-07-18 00:50:00            105
2018-07-18 01:00:00            106
2018-07-18 01:10:00            107
2018-07-18 01:20:00            108 
2018-07-18 01:30:00            109
2018-07-18 01:40:00            110 
2018-07-18 01:50:00            111

I have tried rep() but I can't figure out how to access the value without a for loop and manually adressing the values by their indices (i, j) .

I am trying to solve this problem by using correctly the R language (i.e. no "for" loop)

Upvotes: 1

Answers (3)

GKi

Reputation: 39657

In base you can try a combination of lapply and seq to get the datetime, which simply needs to be combined with the values by using a dataframe like:

data.frame(datetime=do.call(c, lapply(x[,1], seq, by="10 min", length.out = 6)), value=c(t(x[,-1])))
#              datetime value
#1  2018-07-18 00:00:00   100
#2  2018-07-18 00:10:00   101
#3  2018-07-18 00:20:00   102
#4  2018-07-18 00:30:00   103
#5  2018-07-18 00:40:00   104
#6  2018-07-18 00:50:00   105
#7  2018-07-18 01:00:00   106
#8  2018-07-18 01:10:00   107
#9  2018-07-18 01:20:00   108
#10 2018-07-18 01:30:00   109
#11 2018-07-18 01:40:00   110
#12 2018-07-18 01:50:00   111

or an alternative way:

data.frame(datetime=rep(x[,1], each=6) + as.difftime(0:5*10, units = "mins"), value=as.vector(t(x[,-1])))

In case you want to use the column-names as time-difference you can use stack with a different order of the output.

with(stack(x[,-1]), data.frame(datetime=x[,1] + as.difftime(as.numeric(ind), units = "mins"), values))

Data:

x  <- read.csv(check.names=FALSE, text="Timestamp,0,10,20,30,40,50
2018-07-18 00:00:00,100,101,102,103,104,105
2018-07-18 01:00:00,106,107,108,109,110,111")
x[,1]  <- as.POSIXct(x[,1])

Upvotes: 1

Orlando Sabogal

Reputation: 1630

Looks like what you wan is transform your data from a wide format to a long format. Install again tidyr library and use the new function pivot_longer()

See THIS

Upvotes: 1

JasonAizkalns

Reputation: 20463

Here's one way to wrangle this with dplyr, tidyr, and lubridate:

library(tidyverse)
library(lubridate)

df <- structure(list(Timestamp = structure(1:2, .Label = c("7/18/2018 0:00", "7/18/2018 1:00"), class = "factor"), Value.at.hh.00 = c(100L, 106L), hh.10 = c(101L, 107L), hh.20 = c(102L, 108L), hh.30 = c(103L, 109L), hh.40 = c(104L, 110L), hh.50 = c(105L, 111L)), class = "data.frame", row.names = c(NA, -2L))

df %>%
    gather(time, value, -Timestamp) %>%
    mutate(
        time = str_remove(time, "(Value.at.)?hh."),
        Timestamp = mdy_hm(Timestamp),
        minutes = dminutes(as.numeric(time)),
        datetime = Timestamp + seconds
    ) %>%
    select(
        datetime,
        value
    )

#               datetime value
# 1  2018-07-18 00:00:00   100
# 2  2018-07-18 01:00:00   106
# 3  2018-07-18 00:10:00   101
# 4  2018-07-18 01:10:00   107
# 5  2018-07-18 00:20:00   102
# 6  2018-07-18 01:20:00   108
# 7  2018-07-18 00:30:00   103
# 8  2018-07-18 01:30:00   109
# 9  2018-07-18 00:40:00   104
# 10 2018-07-18 01:40:00   110
# 11 2018-07-18 00:50:00   105
# 12 2018-07-18 01:50:00   111

First, make your data "long" using gather() (which as of tidyr v 1.0.0 will become pivot_longer()). Next, we do some cleanup and type conversion. We also minutes from our newly minted time column. Finally, we do some date arithmetic to get a datetime column.

N.B. If you are already on tidyr 1.0.0, gather(...) could become pivot_longer(cols = -Timestamp, names_to = "time", values_to = "value")

Upvotes: 1

What is the best way to convert several data on one line into multiple lines?

Answers (3)

Related Questions