Reputation: 6874
I have a list as follows:
list(character(0), c("\r19:0", "5:20", "7:40", "15:60", "21:80",
"13:100", "17:120", "12:140", "10:160", "8:180"), c("\r6:0",
"6:20", "11:40", "5:60", "1:80", "2:100", "1:120", "1:140", "2:160",
"0:180"), "\r", c("\r12:0", "9:20", "34:40", "46:60", "26:80",
"10:100", "8:120", "6:140", "6:160", "4:180"), c("\r6:0", "14:20",
"26:40", "37:60", "43:80", "38:100", "51:120", "40:140", "21:160",
"26:180"), c("\r0:0", "2:30", "22:60", "15:90", "9:120", "1:150",
"1:180"), c("\r18:0", "14:30", "14:60", "13:90", "13:120", "11:150",
"11:180"))
I want to create a dataframe that has column names corresponding to the value after each colon and for the value before each colon to be dropped in that column so the end result should be:
a0 a20 a30 a40 a60 a80 a90 a100 a120 a140 a150 a160 a180
19 5 7 15 21 13 17 12 10 8
6 6 11 5 1 2 1 1 2 0
12 9 34 46 26 10 8 6 6 4
0 2 22 15 9 1 1
18 14 14 13 13 11 11
Upvotes: 0
Views: 85
Reputation: 269371
Let L
be the input list. Here are a couple of approaches. Neither use any packages. Both output a matrix result. (If it is important that the result be a data.frame convert it to one using as.data.frame(X)
where X
is the matrix produced.)
1) read.dcf We remove the \r characters switch the numbers before and after the colon and keep only those fields having colons. Combine all fields in each list component into a single character string separating them with newlines and adding an extra newline at the end. s
is now a character vector in dcf form so read it using read.dcf
. Next convert the data to numeric and order the columns.
s <- sapply(L, function(x) {
x <- sub("(.*):(.*)", "\\2: \\1", sub("\r", "", x))
paste(c(grep(":", x, value = TRUE), "\n"), collapse = "\n")
})
m <- read.dcf(textConnection(s))
o <- order(as.numeric(colnames(m)))
array(as.numeric(m), dim(m), dimnames(m))[, o]
The result is this matrix:
0 20 30 40 60 80 90 100 120 140 150 160 180
[1,] 19 5 NA 7 15 21 NA 13 17 12 NA 10 8
[2,] 6 6 NA 11 5 1 NA 2 1 1 NA 2 0
[3,] 12 9 NA 34 46 26 NA 10 8 6 NA 6 4
[4,] 6 14 NA 26 37 43 NA 38 51 40 NA 21 26
[5,] 0 NA 2 NA 22 NA 15 NA 9 NA 1 NA 1
[6,] 18 NA 14 NA 14 NA 13 NA 13 NA 11 NA 11
2) tapply First remove the junk eliminating any \r
characters and any elements that do not contain a colon. Then form the row number and column name of each entry and then use tapply
to put it all together creating the same matrix as above:
# remove junk
L2a <- lapply(L, sub, pattern = "\r", replacement = "")
L2b <- lapply(L2a, grep, pattern = ":", value = TRUE)
L2c <- Filter(length, L2b) # keep only entries with positive length
Rows <- rep(seq_along(L2c), lengths(L2c))
Cols <- factor(as.numeric(sub(".*:", "", unlist(L2c))))
Vals <- as.numeric(sub(":.*", "", unlist(L2c)))
tapply(Vals, list(Rows, Cols), c)
Upvotes: 1
Reputation: 1481
Another version of the answer from Grothendieck but bit more manual is to loop through all your elements and create the data frame.
So, called x
your list you can do:
library(dplyr)
# For every elemt in your list
lapply(x, function(i){
# remove the empty ones
if(length(i) == 0)
return(NULL)
# delete the '\r' characters
i = gsub("\r", "", i)
# split the elements on the columns
lapply(strsplit(i, ":"), function(j){
# create a dataframe with the first item as value and the second as column name
data_frame_(setNames(j[1], paste0("a", j[2])))
}) %>% bind_cols() # bind the columns
}) %>% bind_rows() # and the bind the rows
Upvotes: 0