Carina Edel
Carina Edel

Reputation: 37

How to make calculations with specific rows with purrr nested data

So I just started working with Purrr and nested data and I love it but I'm also kind of lost.

What I have is the list that looks kinda like that:

library(tidyverse)

test <- tibble(
  id= rep(1:3, each=20),
  Index = rep(1:20, 3),
  x = rnorm(60),
  y = rnorm(60),
  z = rnorm(60)
)

id  Index    x         y        z
1     1     0.03     -0.39     0.4
1     2     1.2      -0.49     0.6
1     3     1.6      -0.59     0.7
....
2     1     0.2      -6.2      0.1
2     2     1.1      -6.3      0.6
2     3     1.5      -5.1      0.4
...

I nested the data by id

t_nest <- test %>% group_by(id) %>% nest()

+--------------------+----------------------+-----+--+---+
| # A tibble: 3 x 2  |                      |     |  |   |
+--------------------+----------------------+-----+--+---+
| # Groups:   id [3] |                      |     |  |   |
|                    |  id data             |     |  |   |
|   <int> <list>     |                      |     |  |   |
| 1                  |  1 <tibble [20 x 4]> |     |  |   |
| 2                  |  2 <tibble [20 x 4]> |     |  |   |
| 3                  |  3 <tibble [20 x 4]> |     |  |   |
+--------------------+----------------------+-----+--+---+

So what I now want to do is calculate the difference of x between the first and the second Index of each group. I worked around this by mutating a new column with only the rows for the first two indexes. Than I unnested that column and did the calculation and delete it again.

inlever <- function(x){
  inlever = abs(x[[1]]-x[[2]])
  return(inlever)
}

test_inlever <- t_nest %>% 
  mutate(inlever_coord = map(data, ~filter(.,Index == c("1","2")))) %>%  unnest(inlever_coord) %>% 
  group_by(id) %>% 
  mutate(inlever_d = inlever(x)) %>% 
  select(-c(x,y,z,Index))


+--------------------+----------------------+--------+--+-------------+
| # A tibble: 6 x 3  |                      |        |  |             |
+--------------------+----------------------+--------+--+-------------+
| # Groups:   id [3] |                      |        |  |             |
|                    |  id data             |inlever |  |             |
|   <int> <list>     |                      |  <dbl> |  |             |
| 1                  |  1 <tibble [20 x 4]> |  1.68  |  |             |
| 2                  |  1 <tibble [20 x 4]> |  1.68  |  |             |
| 3                  |  2 <tibble [20 x 4]> |  0.964 |  |             |
| 4                  |  2 <tibble [20 x 4]> |  0.964 |  |             |
| 5                  |  3 <tibble [20 x 4]> |  0.135 |  |             |
| 6                  |  3 <tibble [20 x 4]> |  0.135 |  |             |
+--------------------+----------------------+--------+--+-------------+


My question is now

  1. Is there an easier way to do it? Can I directly calculate within the nested data by only selecting the two rows I want to use?
  2. Is there a way to rename the data part in the nested table? Instead of "data" I want it to be called "coordinates" like so:
+--------------------+----------------------+-----+--+---+
| # A tibble: 3 x 2  |                      |     |  |   |
+--------------------+----------------------+-----+--+---+
| # Groups:   id [3] |                      |     |  |   |
|                    |  id coordinates      |     |  |   |
|   <int> <list>     |                      |     |  |   |
| 1                  |  1 <tibble [20 x 4]> |     |  |   |
| 2                  |  2 <tibble [20 x 4]> |     |  |   |
| 3                  |  3 <tibble [20 x 4]> |     |  |   |
+--------------------+----------------------+-----+--+---+

Upvotes: 2

Views: 137

Answers (2)

caldwellst
caldwellst

Reputation: 5956

Is this what you are looking for? Simple to do in just three lines of code with an anonymous function straight from your original data frame. We first arrange the data so we know the indices are in the right order, nest it (with your new name), and one mutate to do the calculations.


test %>%
  arrange(id, Index) %>%
  nest(coordinates = -id) %>%
  mutate(inlever_d = map_dbl(coordinates, ~ abs(.x[['x']][1] - .x[['x']][2])))
#> # A tibble: 3 x 3
#>      id coordinates       inlever_d
#>   <int> <list>                <dbl>
#> 1     1 <tibble [20 x 4]>     0.330
#> 2     2 <tibble [20 x 4]>     0.850
#> 3     3 <tibble [20 x 4]>     0.487

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

I would do the inlever calculation separately before nesting the data and if we need data then add it to the result via join.

library(dplyr)

test %>%
  filter(Index %in% c(1, 2)) %>%
  group_by(id) %>%
  summarise(inlever_d = inlever(x)) %>%
  left_join(test %>% tidyr::nest(coordinates = -id), by = 'id')


# A tibble: 3 x 3
#     id inlever_d coordinates      
#  <int>     <dbl> <list>           
#1     1     0.330 <tibble [20 × 4]>
#2     2     0.850 <tibble [20 × 4]>
#3     3     0.487 <tibble [20 × 4]>

data

set.seed(123)
test <- tibble(
          id= rep(1:3, each=20),
          Index = rep(1:20, 3),
          x = rnorm(60),
          y = rnorm(60),
          z = rnorm(60)
          )

Upvotes: 1

Related Questions