Fee
Fee

Reputation: 89

"un-list" lists in dataframe

I have a data frame with 3 columns, of which one consists of lists. I need to match my data frame variables with the variables in the lists, so sort of un-list the lists.

To explain this better, here an example of my data:

df:

 i.d.    registered_at     steps
 x        2013-12-20        list of dates and integers
 y        2013-10-01        list of dates and integers
 z        2014-01-15        list of dates and integers

my_list for x:

   Day           steps
2012-03-16        556
2012-04-22         3
2013-12-24        1119

the lists are of different length. I would like my data to look like this:

final_df:

 i.d.    registered_at         Day           steps
 x        2013-12-20        2012-03-16        556
 x        2013-12-20        2012-04-22         3
 x        2013-12-20        2013-12-24        1119
 y        2013-10-01        2013-09-08         19
 y        2013-10-01        2013-11-14        208
 z        2014-01-15        2014-01-19         5

I have tried the following:

df2 <- data.frame(matrix(unlist(df$steps), nrow = 957, byrow = T))


install.packages("plyr")
library(plyr)
df3 <- ldply (df$steps, data.frame)


unlist(df$steps, recursive = TRUE, use.names = TRUE)

The following show the str() result for the first row of my data:

> str(ID1)
'data.frame':   1 obs. of  3 variables:
 $ id           : int 5
 $ registered_at: chr "2011-05-20”
 $ steps        :List of 1
  ..$ :'data.frame':    957 obs. of  2 variables:
  .. ..$ day  : chr  "2011-02-16” "2011-02-23” "2012-02-12” "2012-02-        
24” ...
  .. ..$ steps: int  1057 208 709 1221 8656 16279 11988 1628 1431 17379     
...

Further a snapshot of the dput()result of one ID only. I used the first row of my dataframe, for example "x", which I had to shorten with "..." as there were too many values to post this here.

> dput(ID1)
structure(list(id = 5L, registered_at = "2011-05-20”, steps = list(
    structure(list(day = c("2011-02-16” "2011-02-23” "2012-02-12” 
"2012-02-24” ...), 
        steps = c(11057L 208L 709L 1221L 8656L 16279L 11988L 1628L 
1431L 17379L ...
        )), .Names = c("day", "steps"), class = "data.frame", row.names 
= c(NA, 
    957L)))), .Names = c("id", "registered_at", "steps"), row.names = 
1L, class = "data.frame")

> dput(head(df,5))
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{", 
"    if (missing(ncp)) ", "        .Call(C_df, x, df1, df2, log)", 
"    else .Call(C_dnf, x, df1, df2, ncp, log)"), .Dim = c(5L, 
1L), .Dimnames = list(c("1", "2", "3", "4", "5"), ""), class = 
"noquote")

Anyone got a tip? Thanks!

Upvotes: 1

Views: 125

Answers (3)

Fee
Fee

Reputation: 89

As Mikko Marttila commented, the simple answer is:

df2 <- tidyr::unnest(df, steps)

Upvotes: 0

tushaR
tushaR

Reputation: 3116

Try this please:

Based on the output of dput(ID1), I have created the following data.frame:

df1 = structure(list(id = 5L, registered_at = "2011-05-20", steps = list(
structure(list(day = c("2011-02-16", "2011-02-23", "2012-02-12","2012-02-24"), 
               steps = c(11057L,208L,709L,1221L)), .Names = c("day", "steps"), class = "data.frame", row.names 
          = c(NA,957L)))), .Names = c("id", "registered_at", "steps"), row.names = 
    1L, class = "data.frame")

df1 looks like this:

>df1
#id registered_at                                                                 steps
#1  5    2011-05-20 2011-02-16, 2011-02-23, 2012-02-12, 2012-02-24, 11057, 208, 709, 1221

After that using the plyr package's ddply function you can easily create the required data.frame like this:

library(plyr)

ddply(.data = df1,.variables = 'id',function(t){
    n=length(t$steps[[1]]$day)
    steps=unlist(t$steps,recursive = TRUE)
    newdf=data.frame(id=t$id,registered_at=t$registered_at,day=steps[1:n],
    steps=steps[(n+1):length(steps)])
})

This returns:

#  id registered_at        day steps
#1  5    2011-05-20 2011-02-16 11057
#2  5    2011-05-20 2011-02-23   208
#3  5    2011-05-20 2012-02-12   709
#4  5    2011-05-20 2012-02-24  1221

Upvotes: 1

Bertil Baron
Bertil Baron

Reputation: 5003

What about this?

test data

df_nest <- list(
  Date = c("2012-03-16","2012-04-22","2013-12-24"),
  number = c(556,3,1119)
)

df <- tribble(
  ~id, ~important_date, ~dta,
  "x", 2013-12-20, df_nest,
  "y", 2013-12-18, df_nest,
  "z", 2013-12-16, df_nest
)

Then we go through each row and expands the list and bind the together to a new data_frame result

result = NULL
for(row in 1:nrow(df)){
  result = rbind(result,c(id = df$id[row],important_date = df$important_date[row],df$dta[row] %>% unlist(recursive = FALSE)) %>% as_data_frame())
}

Upvotes: 0

Related Questions