Reputation: 860
I have the following hypothetical list
test <- list(a = c("United", "States", "of", "America", "2021", "North", "America"),
b = c("Canada", "2021", "North", "America"),
c = c("Morocco", "2021", "Africa"),
d = c("South", "Africa", "2021", "Africa"),
e = c("Faroe", "Islands", "2021", "Europe"),
f = c("Spain", "2021", "Europe"))
I would like produce the following tibble:
country | year | continent |
---|---|---|
United States of America | 2021 | North America |
Canada | 2021 | North America |
Morocco | 2021 | Africa |
South Africa | 2021 | Africa |
Faroe Islands | 2021 | Europe |
Spain | 2021 | Europe |
I tried to use the ldply()
function of the plyr package
. However, my list elements have unequal lengths, because of compound names.
How could I join this data in a tibble with the variables: country
, year
and continent
, for example?
Upvotes: 3
Views: 73
Reputation: 102519
Here is another base R option
type.convert(setNames(
data.frame(
do.call(
rbind,
lapply(
test,
function(v) {
tapply(v,
cumsum(c(1, diff(grepl("\\d+", v)) != 0)), paste0,
collapse = " "
)
}
)
)
), c("Country", "Year", "Continent")
),
as.is = TRUE
)
which gives
Country Year Continent
a United States of America 2021 North America
b Canada 2021 North America
c Morocco 2021 Africa
d South Africa 2021 Africa
e Faroe Islands 2021 Europe
f Spain 2021 Europe
Upvotes: 2
Reputation: 79328
In base R you could do:
a <- do.call(rbind, lapply(test, function(x) paste(sub("(\\d+)",",\\1,", x), collapse = " ")))
read.csv(text=a, col.names = c("Country","Year","Continent"), h=FALSE)
Country Year Continent
1 United States of America 2021 North America
2 Canada 2021 North America
3 Morocco 2021 Africa
4 South Africa 2021 Africa
5 Faroe Islands 2021 Europe
6 Spain 2021 Europe
Upvotes: 3
Reputation: 226732
Slightly less general/efficient than the other solutions here but maybe more transparent?
cfun <- function(x) {
## find position of numeric value
numpos <- grep("^[0-9]+$", x)
## combine elements appropriately
list(country=paste(x[1:(numpos-1)], collapse=" "),
year=x[numpos],
continent=paste(x[(numpos+1):length(x)], collapse=" "))
}
purrr::map_dfr(test,cfun)
Upvotes: 3
Reputation: 887711
An option is to use rleid
to create a grouping based on the occurence of digits in the list
, then paste
the list
elements and rbind
them
library(data.table)
out <- type.convert(do.call(rbind.data.frame, lapply(test, function(x)
tapply(x, rleid(grepl('\\d+', x)), paste, collapse=' '))), as.is = TRUE)
colnames(out) <- c('country', 'year', 'continent')
row.names(out) <- NULL
-output
out
# country year continent
#1 United States of America 2021 North America
#2 Canada 2021 North America
#3 Morocco 2021 Africa
#4 South Africa 2021 Africa
#5 Faroe Islands 2021 Europe
#6 Spain 2021 Europe
Or use a similar option with rle
from base R
out <- type.convert(do.call(rbind.data.frame,
lapply(test, function(x) tapply(x, with(rle(grepl('\\d+', x)),
rep(seq_along(values), lengths)), FUN = paste, collapse=' '))),
as.is = TRUE)
colnames(out) <- c('country', 'year', 'continent')
Upvotes: 2