Reputation: 89
I have a data frame with 3 columns, of which one consists of lists. I need to match my data frame variables with the variables in the lists, so sort of un-list the lists.
To explain this better, here an example of my data:
df:
i.d. registered_at steps
x 2013-12-20 list of dates and integers
y 2013-10-01 list of dates and integers
z 2014-01-15 list of dates and integers
my_list for x:
Day steps
2012-03-16 556
2012-04-22 3
2013-12-24 1119
the lists are of different length. I would like my data to look like this:
final_df:
i.d. registered_at Day steps
x 2013-12-20 2012-03-16 556
x 2013-12-20 2012-04-22 3
x 2013-12-20 2013-12-24 1119
y 2013-10-01 2013-09-08 19
y 2013-10-01 2013-11-14 208
z 2014-01-15 2014-01-19 5
I have tried the following:
df2 <- data.frame(matrix(unlist(df$steps), nrow = 957, byrow = T))
install.packages("plyr")
library(plyr)
df3 <- ldply (df$steps, data.frame)
unlist(df$steps, recursive = TRUE, use.names = TRUE)
The following show the str()
result for the first row of my data:
> str(ID1)
'data.frame': 1 obs. of 3 variables:
$ id : int 5
$ registered_at: chr "2011-05-20”
$ steps :List of 1
..$ :'data.frame': 957 obs. of 2 variables:
.. ..$ day : chr "2011-02-16” "2011-02-23” "2012-02-12” "2012-02-
24” ...
.. ..$ steps: int 1057 208 709 1221 8656 16279 11988 1628 1431 17379
...
Further a snapshot of the dput()
result of one ID only. I used the first row of my dataframe, for example "x", which I had to shorten with "..." as there were too many values to post this here.
> dput(ID1)
structure(list(id = 5L, registered_at = "2011-05-20”, steps = list(
structure(list(day = c("2011-02-16” "2011-02-23” "2012-02-12”
"2012-02-24” ...),
steps = c(11057L 208L 709L 1221L 8656L 16279L 11988L 1628L
1431L 17379L ...
)), .Names = c("day", "steps"), class = "data.frame", row.names
= c(NA,
957L)))), .Names = c("id", "registered_at", "steps"), row.names =
1L, class = "data.frame")
> dput(head(df,5))
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{",
" if (missing(ncp)) ", " .Call(C_df, x, df1, df2, log)",
" else .Call(C_dnf, x, df1, df2, ncp, log)"), .Dim = c(5L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5"), ""), class =
"noquote")
Anyone got a tip? Thanks!
Upvotes: 1
Views: 125
Reputation: 89
As Mikko Marttila commented, the simple answer is:
df2 <- tidyr::unnest(df, steps)
Upvotes: 0
Reputation: 3116
Try this please:
Based on the output of dput(ID1)
, I have created the following data.frame:
df1 = structure(list(id = 5L, registered_at = "2011-05-20", steps = list(
structure(list(day = c("2011-02-16", "2011-02-23", "2012-02-12","2012-02-24"),
steps = c(11057L,208L,709L,1221L)), .Names = c("day", "steps"), class = "data.frame", row.names
= c(NA,957L)))), .Names = c("id", "registered_at", "steps"), row.names =
1L, class = "data.frame")
df1 looks like this:
>df1
#id registered_at steps
#1 5 2011-05-20 2011-02-16, 2011-02-23, 2012-02-12, 2012-02-24, 11057, 208, 709, 1221
After that using the plyr
package's ddply
function you can easily create the required data.frame like this:
library(plyr)
ddply(.data = df1,.variables = 'id',function(t){
n=length(t$steps[[1]]$day)
steps=unlist(t$steps,recursive = TRUE)
newdf=data.frame(id=t$id,registered_at=t$registered_at,day=steps[1:n],
steps=steps[(n+1):length(steps)])
})
This returns:
# id registered_at day steps
#1 5 2011-05-20 2011-02-16 11057
#2 5 2011-05-20 2011-02-23 208
#3 5 2011-05-20 2012-02-12 709
#4 5 2011-05-20 2012-02-24 1221
Upvotes: 1
Reputation: 5003
What about this?
test data
df_nest <- list(
Date = c("2012-03-16","2012-04-22","2013-12-24"),
number = c(556,3,1119)
)
df <- tribble(
~id, ~important_date, ~dta,
"x", 2013-12-20, df_nest,
"y", 2013-12-18, df_nest,
"z", 2013-12-16, df_nest
)
Then we go through each row and expands the list and bind the together to a new data_frame result
result = NULL
for(row in 1:nrow(df)){
result = rbind(result,c(id = df$id[row],important_date = df$important_date[row],df$dta[row] %>% unlist(recursive = FALSE)) %>% as_data_frame())
}
Upvotes: 0