Sebastian Zeki
Sebastian Zeki

Reputation: 6874

How to split a row into specific columns depending on the split

I have a list as follows:

list(character(0), c("\r19:0", "5:20", "7:40", "15:60", "21:80", 
"13:100", "17:120", "12:140", "10:160", "8:180"), c("\r6:0", 
"6:20", "11:40", "5:60", "1:80", "2:100", "1:120", "1:140", "2:160", 
"0:180"), "\r", c("\r12:0", "9:20", "34:40", "46:60", "26:80", 
"10:100", "8:120", "6:140", "6:160", "4:180"), c("\r6:0", "14:20", 
"26:40", "37:60", "43:80", "38:100", "51:120", "40:140", "21:160", 
"26:180"), c("\r0:0", "2:30", "22:60", "15:90", "9:120", "1:150", 
"1:180"), c("\r18:0", "14:30", "14:60", "13:90", "13:120", "11:150", 
"11:180"))

I want to create a dataframe that has column names corresponding to the value after each colon and for the value before each colon to be dropped in that column so the end result should be:

a0  a20  a30  a40  a60  a80  a90  a100  a120  a140  a150  a160   a180
19   5         7  15    21         13    17    12          10     8
6    6        11   5    1           2    1     1           2      0
12   9        34  46    26         10    8     6           6      4
0        2    22             15          9           1            1
18       14       14         13          13          11           11 

Upvotes: 0

Views: 85

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269371

Let L be the input list. Here are a couple of approaches. Neither use any packages. Both output a matrix result. (If it is important that the result be a data.frame convert it to one using as.data.frame(X) where X is the matrix produced.)

1) read.dcf We remove the \r characters switch the numbers before and after the colon and keep only those fields having colons. Combine all fields in each list component into a single character string separating them with newlines and adding an extra newline at the end. s is now a character vector in dcf form so read it using read.dcf. Next convert the data to numeric and order the columns.

s <- sapply(L, function(x) { 
  x <- sub("(.*):(.*)", "\\2: \\1", sub("\r", "", x))
  paste(c(grep(":", x, value = TRUE), "\n"), collapse = "\n")
})
m <- read.dcf(textConnection(s))
o <- order(as.numeric(colnames(m)))
array(as.numeric(m), dim(m), dimnames(m))[, o]

The result is this matrix:

      0 20 30 40 60 80 90 100 120 140 150 160 180
[1,] 19  5 NA  7 15 21 NA  13  17  12  NA  10   8
[2,]  6  6 NA 11  5  1 NA   2   1   1  NA   2   0
[3,] 12  9 NA 34 46 26 NA  10   8   6  NA   6   4
[4,]  6 14 NA 26 37 43 NA  38  51  40  NA  21  26
[5,]  0 NA  2 NA 22 NA 15  NA   9  NA   1  NA   1
[6,] 18 NA 14 NA 14 NA 13  NA  13  NA  11  NA  11

2) tapply First remove the junk eliminating any \r characters and any elements that do not contain a colon. Then form the row number and column name of each entry and then use tapply to put it all together creating the same matrix as above:

# remove junk
L2a <- lapply(L, sub, pattern = "\r", replacement = "") 
L2b <- lapply(L2a, grep, pattern = ":", value = TRUE)
L2c <- Filter(length, L2b)  # keep only entries with positive length

Rows <- rep(seq_along(L2c), lengths(L2c))
Cols <- factor(as.numeric(sub(".*:", "", unlist(L2c))))
Vals <- as.numeric(sub(":.*", "", unlist(L2c)))

tapply(Vals, list(Rows, Cols), c)

Upvotes: 1

Lorenzo Rossi
Lorenzo Rossi

Reputation: 1481

Another version of the answer from Grothendieck but bit more manual is to loop through all your elements and create the data frame.

So, called x your list you can do:

library(dplyr)
# For every elemt in your list
lapply(x, function(i){
  # remove the empty ones
  if(length(i) == 0)
    return(NULL)
  # delete the '\r' characters
  i = gsub("\r", "", i)
  # split the elements on the columns
  lapply(strsplit(i, ":"), function(j){ 
     # create a dataframe with the first item as value and the second as column name
     data_frame_(setNames(j[1], paste0("a", j[2])))
  }) %>% bind_cols() # bind the columns
}) %>% bind_rows() # and the bind the rows

Upvotes: 0

Related Questions