Reputation: 508
I am trying to create an artificial dataframe of words contributed and deleted by users of Wikipedia for each edit that they make, the end result should look like this:
I created some artifical data to build such a frame but I'm having problems with the variables "Tokens Added" and "Tokens deleted".
I thought creating them as lists of lists would allow me to include them in dataframes even if the elements do not always have equal length. But apparently thats not the case. Instead, R creates a variable for each individual token. thats not feasible because it would create millions of variables. Here is some code to exemplify:
a <- c(1,2,3)
e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE)))
DF <- cbind(a,e)
U <- data.frame(a,e)
I would like to have it like this:
Is this possible at all in R with dataframes (I tried dearching for answers already but they were either for different questions or too technical for me)? Any help is much appreciated!
Upvotes: 3
Views: 3770
Reputation: 10671
You can do exactly what you want if you are willing to use library(tibble)
:
library(tibble)
a <- c(1,2,3)
e <- list(b = as.list(c("a","b")),c = as.list(c(1L,3L,5L,4L)),d = as.list(c(TRUE,FALSE,TRUE)))
tibble(a,e)
# A tibble: 3 × 2
a e
<dbl> <list>
1 1 <list [2]>
2 2 <list [4]>
3 3 <list [3]>
A tibble
or tbl_df
will behave just like you are used to with a traditional data.frame
but allow you some nice extra functionality like storing lists of various lengths in a column.
Upvotes: 2
Reputation: 508
Thanks for all the suggestions everyone! I think I found a simpler solution though. Just in case anyone else has a similar problem in the future, this is what I did:
a <- c(1,2,3)
b <- c("a","b")
c <- c(1L,3L,5L,4L)
d <- c(TRUE,FALSE,TRUE)
e <- list(b,c,d);e
DF <- data.frame(a,I(e));DF
The I() inhibit function apparently prevents the lists from being converted and the column behaves just like a list of lists as far as I can tell so far. The class of the e column is however not "list" but "AsIs". I don't know whether this might cause problems further down the line, if so, I will update this answer!
EDIT
So it turns out that some functions do not take the AsIs class as input. To convert it back to a usefull character string, you can simply use unlist() on every row.
Upvotes: 1
Reputation: 14370
I don't think what you want is possible using a vector of lists (as you suggest in your question). This is mainly because you can't create a vector of lists in R (see: How to create a vector of lists in R?)
However, one option (if you really want a data.frame
) would be to coerce everything to a character (the most flexible type in R). Something like this might work for you:
e <- c(paste0(c("a","b"),collapse=","), paste0(c(1L,3L,5L,4L), collapse = ","), paste0(c(TRUE,FALSE,TRUE), collapse = ","))
U <- data.frame(a,e, stringAsFactors = F)
U
# a e
#1 1 a,b
#2 2 1,3,5,4
#3 3 TRUE,FALSE,TRUE
Then you can back out the value of each cell with a split. Something like:
strsplit(U$e, ",")
Upvotes: 1
Reputation: 2070
Try this:
cbind(a,lapply(e,function(x) paste(unlist(x),collapse=",")))
Upvotes: 0