Wilson Souza
Wilson Souza

Reputation: 860

Convert list in data frame adjusting compound names

I have the following hypothetical list

test <- list(a = c("United", "States", "of", "America", "2021", "North", "America"),
             b = c("Canada", "2021", "North", "America"),
             c = c("Morocco", "2021", "Africa"),
             d = c("South", "Africa", "2021", "Africa"),
             e = c("Faroe", "Islands", "2021", "Europe"),
             f = c("Spain", "2021", "Europe"))

I would like produce the following tibble:

country year continent
United States of America 2021 North America
Canada 2021 North America
Morocco 2021 Africa
South Africa 2021 Africa
Faroe Islands 2021 Europe
Spain 2021 Europe

I tried to use the ldply() function of the plyr package. However, my list elements have unequal lengths, because of compound names.

How could I join this data in a tibble with the variables: country, year and continent, for example?

Upvotes: 3

Views: 73

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 102519

Here is another base R option

type.convert(setNames(
  data.frame(
    do.call(
      rbind,
      lapply(
        test,
        function(v) {
          tapply(v,
            cumsum(c(1, diff(grepl("\\d+", v)) != 0)), paste0,
            collapse = " "
          )
        }
      )
    )
  ), c("Country", "Year", "Continent")
),
as.is = TRUE
)

which gives

                   Country Year     Continent
a United States of America 2021 North America
b                   Canada 2021 North America
c                  Morocco 2021        Africa
d             South Africa 2021        Africa
e            Faroe Islands 2021        Europe
f                    Spain 2021        Europe

Upvotes: 2

Onyambu
Onyambu

Reputation: 79328

In base R you could do:

a <- do.call(rbind, lapply(test, function(x) paste(sub("(\\d+)",",\\1,", x), collapse = " ")))

read.csv(text=a, col.names = c("Country","Year","Continent"), h=FALSE)
                    Country Year      Continent
1 United States of America  2021  North America
2                   Canada  2021  North America
3                  Morocco  2021         Africa
4             South Africa  2021         Africa
5            Faroe Islands  2021         Europe
6                    Spain  2021         Europe

Upvotes: 3

Ben Bolker
Ben Bolker

Reputation: 226732

Slightly less general/efficient than the other solutions here but maybe more transparent?

cfun <- function(x) {
    ## find position of numeric value
    numpos <- grep("^[0-9]+$", x)
    ## combine elements appropriately
    list(country=paste(x[1:(numpos-1)], collapse=" "),
         year=x[numpos],
         continent=paste(x[(numpos+1):length(x)], collapse=" "))
}

purrr::map_dfr(test,cfun)

Upvotes: 3

akrun
akrun

Reputation: 887711

An option is to use rleid to create a grouping based on the occurence of digits in the list, then paste the list elements and rbind them

library(data.table)
out <- type.convert(do.call(rbind.data.frame, lapply(test, function(x)
    tapply(x,  rleid(grepl('\\d+', x)), paste, collapse=' '))), as.is = TRUE)
colnames(out) <- c('country', 'year', 'continent')
row.names(out) <- NULL

-output

out
#                   country year     continent
#1 United States of America 2021 North America
#2                   Canada 2021 North America
#3                  Morocco 2021        Africa
#4             South Africa 2021        Africa
#5            Faroe Islands 2021        Europe
#6                    Spain 2021        Europe

Or use a similar option with rle from base R

out <- type.convert(do.call(rbind.data.frame,
    lapply(test, function(x) tapply(x, with(rle(grepl('\\d+', x)), 
 rep(seq_along(values), lengths)), FUN = paste, collapse=' '))), 
      as.is = TRUE)

 colnames(out) <- c('country', 'year', 'continent')

Upvotes: 2

Related Questions