Reputation: 556

Transformation of a vector to a data frame with two columns in R

I have the following dataset, it consists of ID's (integer, although all is stored as a factor) and units of time, which belong to the ID above (see an excerpt of data below, total data has 300'000+ values).

What I want is a data frame, which has in one column the ID and the other column the time. Something like this:

10000007  01:33:28
10000007  00:00:05
10000010  00:00:39
10000010  00:02:21
10000010  00:05:59
10000010  00:40:40
10000010  00:09:53
10000014  00:09:37
10000014  00:00:00
10000014  00:00:15
10000014  00:00:02
10000014  00:40:51
...

I struggle with the fact that the number of time stamps differs between the different ID's and the fact that I can't change the format to numeric efficiently, which might make the operation handier.

Eventually, the idea is to have the times summed up to have only one time per ID. Thank you very much in advance!

Upvotes: 1

Answers (3)

akrun

Reputation: 887891

In base R, we can use stack

i1 <- !grepl(":", df$V1)
out <- stack(setNames(split(df$V1[!i1], cumsum(i1)[!i1]), df$V1[i1]))[2:1]
head(out)
#    ind   values
#1 10000007 01:33:28
#2 10000007 00:00:05
#3 10000010 00:00:39
#4 10000010 00:02:21
#5 10000010 00:05:59
#6 10000010 00:40:40


dim(out)
#[1] 30  2

Or it can be done with data.frame call

data.frame(V1 = df$V1[i1][cumsum(i1)], V2 = df$V1)[!i1,]

Or with transform

transform(df, V2 = V1[i1][cumsum(i1)])[!i1,]

data

df <- structure(list(V1 = c("10000007", "01:33:28", "00:00:05", "10000010", 
"00:00:39", "00:02:21", "00:05:59", "00:40:40", "00:09:53", "10000014", 
"00:09:37", "00:00:00", "00:00:15", "00:00:02", "00:40:51", "10000022", 
"00:41:44", "01:32:32", "10000023", "01:07:14", "01:25:27", "1000003", 
"00:00:58", "00:03:42", "00:05:58", "00:00:29", "00:00:01", "00:04:02", 
"00:00:10", "00:01:08", "00:07:52", "00:10:18", "00:12:52", "00:09:39", 
"00:07:48", "00:04:29")), class = "data.frame", row.names = c(NA, 
-36L))

Upvotes: 1

jay.sf

Reputation: 73692

In base R split at those values where as.POSIXct yield NA, then rbind the result.

res <- do.call(rbind.data.frame, 
               by(x, cumsum(is.na(as.POSIXct(x, format="%T"))), function(x) 
                 cbind(as.character(x)[1], as.character(x)[-1])))
head(res)
#           V1       V2
# 1.1 10000007 01:33:28
# 1.2 10000007 00:00:05
# 2.1 10000010 00:00:39
# 2.2 10000010 00:02:21
# 2.3 10000010 00:05:59
# 2.4 10000010 00:40:40

Data:

x <- structure(c(31L, 30L, 4L, 32L, 8L, 11L, 16L, 24L, 21L, 33L, 19L, 
1L, 6L, 3L, 25L, 34L, 26L, 29L, 35L, 27L, 28L, 36L, 9L, 12L, 
15L, 7L, 2L, 13L, 5L, 10L, 18L, 22L, 23L, 20L, 17L, 14L), .Label = c("00:00:00", 
"00:00:01", "00:00:02", "00:00:05", "00:00:10", "00:00:15", "00:00:29", 
"00:00:39", "00:00:58", "00:01:08", "00:02:21", "00:03:42", "00:04:02", 
"00:04:29", "00:05:58", "00:05:59", "00:07:48", "00:07:52", "00:09:37", 
"00:09:39", "00:09:53", "00:10:18", "00:12:52", "00:40:40", "00:40:51", 
"00:41:44", "01:07:14", "01:25:27", "01:32:32", "01:33:28", "10000007", 
"10000010", "10000014", "10000022", "10000023", "1000003"), class = "factor")

Upvotes: 1

tmfmnk

Reputation: 40171

One dplyr option could be:

df %>%
 group_by(grp = cumsum(grepl("^100", V1))) %>%
 mutate(V2 = first(V1)) %>%
 slice(-1) %>%
 ungroup() %>%
 select(-grp) 

   V1       V2      
   <chr>    <chr>   
 1 01:33:28 10000007
 2 00:00:05 10000007
 3 00:00:39 10000010
 4 00:02:21 10000010
 5 00:05:59 10000010
 6 00:40:40 10000010
 7 00:09:53 10000010
 8 00:09:37 10000014
 9 00:00:00 10000014
10 00:00:15 10000014
# … with 20 more rows

Upvotes: 1

Transformation of a vector to a data frame with two columns in R

Answers (3)

data

Related Questions