Wadim iLchuk
Wadim iLchuk

Reputation: 27

For loop in R to rewrite initial datasets

UPD: HERE what I need:

Example of some datasets are here (I have 8 of them): https://drive.google.com/drive/folders/1gBV2ZkywW6JqDjRICafCwtYhh2DHWaUq?usp=sharing

What I need is:

For example, in those datasets there is lev variable. Let's say this is a snapshot of the data in these datasets:

ID   Year  lev   
1    2011  0.19 
1    2012  0.19 
1    2013  0.21 
1    2014  0.18 
2    2013  0.39 
2    2014  0.15 
2    2015  0.47 
2    2016  0.35 
3    2013  0.30 
3    2015  0.1  
3    2017  0.13 
3    2018  0.78 
4    2011  0.13 
4    2012  0.35 

Now, I need to create in each of my datasets EE_AB, EE_C, EE_H, etc., create variables ff1 and ff2 which are constructed for year ID, in each year respectively to the median of the whole IDs in that particular year.

Let's take an example of the year 2011. The median of the variable lev in this dataset in 2011 is (0.19+0.13)/2 = 0.16, so ff1 for ID 1 in 2011 should be 0.19/0.16 = 1.1875, and for ID 4 in 2011 ff1 = 0.13/0.16 = 0.8125.

Now let's take the example of 2013. The median lev is 0.3. so ff1 for ID 1, 2, 3 will be 0.7, 1.3, 1 respectively.

The desired output should be the ff1 variable in each dataset (e.g., EE_AB, EE_C, EE_H) as:

ID   Year  lev    ff1    
1    2011  0.19  1.1875
1    2012  0.19  0.7037
1    2013  0.21  0.7
1    2014  0.18  1.0909
2    2013  0.39  1.3
2    2014  0.15  0.9091
2    2015  0.47  1.6491
2    2016  0.35  1
3    2013  0.30  1
3    2015  0.1   0.3509
3    2017  0.13  1
3    2018  0.78  1
4    2011  0.13  0.8125
4    2012  0.35  1.2963

And this should be in the same way for other dataframes.

Upvotes: 0

Views: 55

Answers (1)

r2evans
r2evans

Reputation: 160447

Here's a tidyverse method:

library(dplyr)
# library(purrr)
data_frameAB %>%
  group_by(Year) %>%
  mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
  ungroup()
# # A tibble: 14 x 5
#       ID  Year     c     d   ff1
#    <dbl> <dbl> <dbl> <dbl> <dbl>
#  1     1  2011    10    12  2.2 
#  2     1  2012    11    13  2.18
#  3     1  2013    12    14  2.17
#  4     1  2014    13    15  2.15
#  5     1  2015    14    16  2.14
#  6     1  2016    15    34  3.27
#  7     1  2017    16    25  2.56
#  8     1  2018    17    26  2.53
#  9     1  2019    18    56  4.11
# 10    15  2015    23    38  2.65
# 11    15  2016    26    25  1.96
# 12    15  2017    30    38  2.27
# 13    45  2011   100   250  3.5 
# 14    45  2012   200   111  1.56

Without purrr, that inner expression would be

  mutate(ff1 = (c+d) / mapply(median, c, d))

albeit with type-safeness.

Since you have multiple frames in your data management, I have two suggestions:

  1. Combine them into a list. This recommendation stems off the assumption that whatever you're doing to one frame you are likely to do all three. In that case, you can use lapply or purrr::map on the list of frames, doing all frames in one step. See https://stackoverflow.com/a/24376207/3358227.

    list_of_frames <- list(AB=data_frameAB, C=data_frameC, F=data_frameF)
    list_of_frames2 <- purrr::map(
      list_of_frames,
      ~ .x %>%
        group_by(Year) %>%
        mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>% ungroup()
    )
    

    Again, without purrr, that would be

    list_of_frames2 <- lapply(
      list_of_frames,
      function(.x) group_by(.x, Year) %>%
        mutate(ff1 = (c+d) / mapply(median c, d)) %>%
        ungroup()
    )
    
  2. Combine them into one frame, preserving the original data. Starting with list_of_frames,

    bind_rows(list_of_frames, .id = "Frame") %>%
      group_by(Frame, Year) %>%
      mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
      ungroup()
    # # A tibble: 42 x 6
    #    Frame    ID  Year     c     d   ff1
    #    <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
    #  1 AB        1  2011    10    12  2.2 
    #  2 AB        1  2012    11    13  2.18
    #  3 AB        1  2013    12    14  2.17
    #  4 AB        1  2014    13    15  2.15
    #  5 AB        1  2015    14    16  2.14
    #  6 AB        1  2016    15    34  3.27
    #  7 AB        1  2017    16    25  2.56
    #  8 AB        1  2018    17    26  2.53
    #  9 AB        1  2019    18    56  4.11
    # 10 AB       15  2015    23    38  2.65
    # # ... with 32 more rows
    

Upvotes: 1

Related Questions