giac
giac

Reputation: 4299

R - dplyr map slice for repeat rows

I have trouble combining slice and map.

I am interested of doing something similar to this; which is, in my case, transforming a compact person-period file to a long (sequential) person-period one. However, because my file is too big, I need to split the data first.

My data look like this

    group id var ep dur
1      A  1   a  1  20
2      A  1   b  2  10
3      A  1   a  3   5
4      A  2   b  1   5
5      A  2   b  2  10
6      A  2   b  3  15
7      B  1   a  1  20
8      B  1   a  2  10
9      B  1   a  3  10
10     B  2   c  1  20
11     B  2   c  2   5
12     B  2   c  3  10

What I need is simply this (answer from this)

library(dplyr)
dt %>% slice(rep(1:n(),.$dur))

However, I am interested in introducing a split(.$group).

How I am suppose to do so ?

dt %>% split(.$group) %>% map_df(slice(rep(1:n(),.$dur)))

Is not working for example.

My desired output is the same as dt %>% slice(rep(1:n(),.$dur)) which is

     group id var ep dur
1       A  1   a  1  20
2       A  1   a  1  20
3       A  1   a  1  20
4       A  1   a  1  20
5       A  1   a  1  20
6       A  1   a  1  20
7       A  1   a  1  20
8       A  1   a  1  20
9       A  1   a  1  20
10      A  1   a  1  20
.....

But I need to split this operation because the file is too big.

data

dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L, 
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a", 
"b", "c"), class = "factor"), ep = structure(c(1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", "2", 
"3"), class = "factor"), dur = c(20, 10, 5, 5, 10, 15, 20, 
10, 10, 20, 5, 10)), .Names = c("group", "id", "var", "ep", 
"dur"), row.names = c(NA, -12L), class = "data.frame")

Upvotes: 1

Views: 1763

Answers (2)

Jim Leach
Jim Leach

Reputation: 459

I'm not quite sure what your desired final output is, but you could use tidyr to nest the data that you want to repeat and a simple function to expand levels of your nested data, very similar to Tutuchan's answer.

expand_df <- function(df, repeats) {
  df %>% slice(rep(1:n(), repeats))
}

dt %>% 
    tidyr::nest(var:ep) %>% 
    mutate(expanded = purrr::map2(data, dur, expand_df)) %>% 
    select(-data) %>% 
    tidyr::unnest()

Tutuchan's answer gives exactly the same output as your original approach - is that what you were looking for? I don't know if it will have any advantage over your original method.

Upvotes: 1

Tutuchan
Tutuchan

Reputation: 1567

map takes two arguments: a vector/list in .x and a function in .f. It then applies .f on all elements in .x.

The function you are passing to map is not formatted correctly. Try this:

f <- function(x) x %>% slice(rep(1:n(), .$dur))
dt %>% 
  split(.$group) %>% 
  map_df(f)

You could also use it like this:

dt %>% 
  split(.$group) %>% 
  map_df(slice, rep(1:n(), dur))

This time you directly pass the slice function to map with additional parameters.

Upvotes: 3

Related Questions