Reputation: 1550
I have a list L of named vectors. For example, 1st element:
> L[[1]]
$event
[1] "EventA"
$time
[1] "1416355303"
$city
[1] "Los Angeles"
$region
[1] "California"
$Locale
[1] "en-GB"
when I unlist
each element of the list the resulting vectors looks like this (for the 1st 3 elements):
> unlist(L[[1]])
event time city region Locale
"EventA" "1416355303" "Los Angeles" "California" "en-GB"
> unlist(L[[2]])
event time Locale
"EventB" "1416417567" "en-GB"
> unlist(L[[3]])
event properties.time
"EventM" "1416417569"
I have over 0.5 million elements in the list and each one has up to 42 of these feaures/names. I have to merge them into a dataframe taken into account their names and that not all of them have the same number of feaures or names (in the example above, V2 has no information for region
and city
). At the moment, what I do is a loop through the whole list:
df1 <- merge(stack(unlist(L[[1]])), stack(unlist(L[[2]])),
by = "ind", all = TRUE)
suppressWarnings(for (i in 3:length(L)){
df1 <- merge(df1, stack(unlist(L[[i]])), by = "ind", all = TRUE)
})
df1 <- as.data.frame(t(df1))
For the example above this returns:
V1 V2 V3 V4 V5
ind city event Locale region time
values.x Los Angeles EventA en-GB California 1416355303
values.y <NA> EventB en-GB <NA> 1416417567
values <NA> EventM <NA> <NA> 1416417569
which is what I want. However, bearing in mind the length of the list and the fact that every time that the command:
df1 <- merge(df1, stack(unlist(L[[i]])), by = "ind", all = TRUE)
runs, loads the entire data frame (df1), the loop takes a very long time. Therefore, I was wondering if anyone knows a better/faster way to code this. In other words. Given a long list of named vectors with different lengths, is there a fast way to merge them into a data frame as the one described above.
For example, is there a way of doing this using foreach
and %dopar%
? In any case, any faster approach is welcome.
Upvotes: 3
Views: 2715
Reputation: 41
The original post is about merging named vectors. Define the first two given in the example above as vectors:
>C1 <- c(event = "EventA", time = 1416355303,
city = "Los Angeles", region = "California",
Locale = "en-GB")
>C2 <- c(event = "EventB", time = 1416417567,
Locale = "en-GB")
If you want to merge them and are OK to give up the extra data in the longer vector vector, then you can index the longer vector by names in the shorter vector
>C1 <- C1[names(C2)]
Then just use rbind or cbind. Example with rbind
>C1_C2 <- rbind(C1,C2)
>C1_C2
event time Locale
C1 "EventA" "1416355303" "en-GB"
C2 "EventB" "1416417567" "en-GB"
You can combine the final two steps but will lose the name of the first vector if you do that
Upvotes: 1
Reputation: 99331
I've heard the data.table
package is pretty fast. And rbindlist
is perfect for this list.
library(data.table)
rbindlist(L, fill=TRUE)
# event time city region Locale
# 1: EventA 1416355303 Los Angeles California en-GB
# 2: EventB 1416417567 NA NA en-GB
# 3: EventM 1416417569 NA NA NA
Upvotes: 5
Reputation: 193517
Here's a compact solution to consider:
library(reshape2)
dcast(melt(L), L1 ~ L2, value.var = "value")
# L1 city event Locale region time
# 1 1 Los Angeles EventA en-GB California 1416355303
# 2 2 <NA> EventB en-GB <NA> 1416417567
# 3 3 <NA> EventM <NA> <NA> 1416417569
Upvotes: 2
Reputation: 132706
I'm not sure why you use merge
. It seems to me like you should simply rbind
.
L <- list(list(event = "EventA", time = 1416355303,
city = "Los Angeles", region = "California",
Locale = "en-GB"),
list(event = "EventB", time = 1416417567,
Locale = "en-GB"),
list(event = "EventM", time = 1416417569))
library(plyr)
do.call(rbind.fill, lapply(L, as.data.frame))
# event time city region Locale
#1 EventA 1416355303 Los Angeles California en-GB
#2 EventB 1416417567 <NA> <NA> en-GB
#3 EventM 1416417569 <NA> <NA> <NA>
Upvotes: 2