ad-ryan
ad-ryan

Reputation: 1

How to combine data with same rownames to one column in R

I'm trying to move a large list with >200000 character from this:

startTime  1
max        3
min        1        
EndTime    2
avg        2     
startTime  2
max       ..
min       ..
EndTime   ..
avg       ..
..

to a dataframe like this:

startTime  max  min  EndTime  avg  
1          3    1    2        2
2          ..   ..   ..       ..

I managed it by looping it through a for-loop. It takes to much time. Is there a more sufficient way by not looping it through a for-loop?

Upvotes: 0

Views: 71

Answers (5)

moodymudskipper
moodymudskipper

Reputation: 47320

A tidyverse solution using @markus' data would be :

library(tidyverse)
dat %>% 
  group_by(tmp = cumsum(V1=="startTime"))  %>%
  spread(V1,V2) %>%
  ungroup %>%
  select(-tmp)

# # A tibble: 2 x 5
#     avg EndTime   max   min startTime
#   <int>   <int> <int> <int>     <int>
# 1     2       2     3     1         1
# 2     6       5     3     4         2

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269586

Here are some alternatives. They do not use any packages.

Assume the input DF shown reproducibly in the Note at the end.

1) xtabs The first line of code converts the first column to character in case it is factor. We do not need this with the data shown in the Note but it doesn't hurt and might be useful if the column were factor so that it is in a known state.

Then convert the V1 column to a factor having levels in the order that appear so that they don't get rearranged upon output. Also define nicer names and create a Group number vector which numbers the first group of 5 rows as 1, the second group 2 and so on.

Finally use xtabs to create the desired table. If you prefer a data frame as the output rather than a table then use as.data.frame(xt).

DF2 <- transform(DF, V1 = as.character(V1))
DF2 <- transform(DF2, Stat = factor(V1, levels = V1[1:5]), 
  Value = V2,
  Group = cumsum(V1== "startTime"))
xt <- xtabs(Value ~ Group + Stat, DF2)
xt

giving:

     Stat
Group startTime max min EndTime avg
    1         1   3   1       2   2
    2         2   4   1       3   2

2) matrix Even shorter is this one-liner. It gives a matrix. Use as.data.frame(m) if you want a data frame.

m <- matrix(DF$V2,, 5, byrow = TRUE, list(NULL, DF$V1[1:5]))
m

giving:

     startTime max min EndTime avg
[1,]         1   3   1       2   2
[2,]         2   4   1       3   2

Note

The input in reproducible form. I have added a few rows.

Lines <- "
startTime  1
max        3
min        1        
EndTime    2
avg        2     
startTime  2
max        4
min        1
EndTime    3
avg        2"
DF <- read.table(text = Lines, as.is = TRUE)

Upvotes: 0

markus
markus

Reputation: 26343

Expanding your input data a bit you could use unstack from base R.

Input:

dat
#          V1 V2
#1  startTime  1
#2        max  3
#3        min  1
#4    EndTime  2
#5        avg  2
#6  startTime  2
#7        max  3
#8        min  4
#9    EndTime  5
#10       avg  6

Result:

out <- unstack(dat, V2 ~ V1)
out
#  avg EndTime max min startTime
#1   2       2   3   1         1
#2   6       5   3   4         2

If you want the column names in the same order as the they appear in dat$V1 do

out <- out[unique(dat$V1)]

data

dat <- structure(list(V1 = c("startTime", "max", "min", "EndTime", "avg", 
"startTime", "max", "min", "EndTime", "avg"), V2 = c(1L, 3L, 
1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-10L))

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

This is not an exact duplicate of How to reshape data from long to wide format? so I will answer.

First create a new column ID and then use one of the solutions in the duplicate. I will use the solution based on package reshape2.

pattern <- as.character(df1[1, 1])
ipat <- grep(pattern, df1[[1]])

df1$ID <- rep(seq_along(ipat), nrow(df1)/length(ipat))

library(reshape2)

result <- dcast(df1, ID ~ V1, value.var = "V2")[-1]
#  avg EndTime max min startTime
#1   2       3   4   1         1
#2   1       2   3   2         2

Final clean up, put the input dataset df1 back as it were.

df1 <- df1[-ncol(df1)]

Data.

df1 <- read.table(text = "
startTime  1
max        3
min        1        
EndTime    2
avg        2     
startTime  2
max        4
min        2
EndTime    3
avg        1                  
")

Upvotes: 0

Wimpel
Wimpel

Reputation: 27732

simply tranform it

library( data.table )

dt <- data.table::fread(" startTime  1
max        3
                        min        1        
                        EndTime    2
                        avg        2     
                        startTime  2", header = FALSE)

as.data.table( t( dt ) )

#           V1  V2  V3      V4  V5        V6
# 1: startTime max min EndTime avg startTime
# 2:         1   3   1       2   2         2

Upvotes: 0

Related Questions