Forge
Forge

Reputation: 1677

dataframe column containing lists extracted as columns of the same dataframe

I have a dataframe with 3 columns. One of the columns (the second) contains a list of values per cell. Here dput sample data:

   df <- structure(list(column1 = c("HEATER", "COOLER"), column2 = list(structure(list(
        insidelist = structure(list(es = list("1"), en = list("00"), la = list(
            "01")), .Names = c("es", "en", "la"))), .Names = "insidelist"), 
        structure(list(insidelist = structure(list(es = list("1"), en = list(
            "01"), la = list("01")), .Names = c("es", "en", "la"))), .Names = "insidelist")), 
        column3 = c("88", "31")), .Names = c("column1", "column2", "column3"
    ), row.names = c(NA, -2L), class = "data.frame")

Giving this df:

  column1   column2     column3
1  HEATER   1, 00, 01      88
2  COOLER   1, 01, 01      31

How to get that list of values from second column as columns of the original dataframe?

Desired output:

  column1   column2 Column3 column4 column5
1  HEATER     1       00       01      88
2  COOLER     1       01       01      31

Upvotes: 1

Views: 97

Answers (3)

useRj
useRj

Reputation: 1312

Don't get me wrong, I love tidy-way of doing things as much as everyone here, and many people are learning R programming walking by an easier path thanks to it, but I think sometimes when you have a hammer everything looks like a nail.

Tidyverse has a lot of virtues, but some drawbacks too, one of them that seems to mask/hide the basics of the R language. In this case the most powerful and "human readable" solution (imho) is to mix approaches in a readable way.

Let’s take a look. First we get rid of nested lists converting them to data frame:

df$column2 <- data.frame(matrix(unlist(df$column2), nrow=nrow(df), byrow=T))

> df
  column1 column2.X1 column2.X2 column2.X3 column3
1  HEATER          1         00         01      88
2  COOLER          1         01         01      31

Then extract the inner data frame (column2) and put it side by side with original df:

df <- cbind(select(df,-column2), df$column2)

Selecting/Renaming columns is a trivial task. Here, an example after binding:

df <- cbind(df, df$column2) %>%
      select(Column1=1, Column2=4, Column3=5, Column4=6, Column5=3)

This gives us the desired output:

> df

  Column1 Column2 Column3 Column4 Column5
1  HEATER       1    00      01      88
2  COOLER       1    01      01      31

Plunging into tidy code sometimes ends in a not-so-tidy-solution. I know many people are learning R this way, but wise programmers should be wary of the dark places this can lead to if you rush to tidyverse for every problem not taking base R into account.

Upvotes: 1

akrun
akrun

Reputation: 887028

We can do

library(tidyverse)
df %>% 
  mutate(out = map(column2, ~ .x %>%
                              transpose %>%
                              unlist %>%
                              as.list %>% 
                              as_tibble)) %>% 
         unnest %>% 
         select(-column2)

Upvotes: 1

zack
zack

Reputation: 5405

Here's my go - not as concise as akrun and AntoniosK, but maybe a little more readable:

df %>% 
  unnest(column2) %>% 
  mutate(lengths = map_int(column2, ~ length(unlist(.x))),
         column2 = map_chr(column2, ~ glue::collapse(unlist(.x), sep = ',') )) %>% 
  separate(column2, sep = ',', into = paste('temp', seq(1,max(.$lengths)), sep = '_')) %>%
  select(column1, starts_with('temp'), column3) %>%
  setNames(paste0("column", 1:ncol(.)))

Just a note - it looks like the answers in comments run a bit faster, so if you're working with a large dataset - it may be wise to go with those.

Upvotes: 0

Related Questions