Reputation: 27
UPD: HERE what I need:
Example of some datasets are here (I have 8 of them): https://drive.google.com/drive/folders/1gBV2ZkywW6JqDjRICafCwtYhh2DHWaUq?usp=sharing
What I need is:
For example, in those datasets there is lev variable. Let's say this is a snapshot of the data in these datasets:
ID Year lev
1 2011 0.19
1 2012 0.19
1 2013 0.21
1 2014 0.18
2 2013 0.39
2 2014 0.15
2 2015 0.47
2 2016 0.35
3 2013 0.30
3 2015 0.1
3 2017 0.13
3 2018 0.78
4 2011 0.13
4 2012 0.35
Now, I need to create in each of my datasets EE_AB, EE_C, EE_H, etc., create variables ff1 and ff2 which are constructed for year ID, in each year respectively to the median of the whole IDs in that particular year.
Let's take an example of the year 2011. The median of the variable lev in this dataset in 2011 is (0.19+0.13)/2 = 0.16, so ff1 for ID 1 in 2011 should be 0.19/0.16 = 1.1875, and for ID 4 in 2011 ff1 = 0.13/0.16 = 0.8125.
Now let's take the example of 2013. The median lev is 0.3. so ff1 for ID 1, 2, 3 will be 0.7, 1.3, 1 respectively.
The desired output should be the ff1 variable in each dataset (e.g., EE_AB, EE_C, EE_H) as:
ID Year lev ff1
1 2011 0.19 1.1875
1 2012 0.19 0.7037
1 2013 0.21 0.7
1 2014 0.18 1.0909
2 2013 0.39 1.3
2 2014 0.15 0.9091
2 2015 0.47 1.6491
2 2016 0.35 1
3 2013 0.30 1
3 2015 0.1 0.3509
3 2017 0.13 1
3 2018 0.78 1
4 2011 0.13 0.8125
4 2012 0.35 1.2963
And this should be in the same way for other dataframes.
Upvotes: 0
Views: 55
Reputation: 160447
Here's a tidyverse method:
library(dplyr)
# library(purrr)
data_frameAB %>%
group_by(Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
ungroup()
# # A tibble: 14 x 5
# ID Year c d ff1
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2011 10 12 2.2
# 2 1 2012 11 13 2.18
# 3 1 2013 12 14 2.17
# 4 1 2014 13 15 2.15
# 5 1 2015 14 16 2.14
# 6 1 2016 15 34 3.27
# 7 1 2017 16 25 2.56
# 8 1 2018 17 26 2.53
# 9 1 2019 18 56 4.11
# 10 15 2015 23 38 2.65
# 11 15 2016 26 25 1.96
# 12 15 2017 30 38 2.27
# 13 45 2011 100 250 3.5
# 14 45 2012 200 111 1.56
Without purrr
, that inner expression would be
mutate(ff1 = (c+d) / mapply(median, c, d))
albeit with type-safeness.
Since you have multiple frames in your data management, I have two suggestions:
Combine them into a list
. This recommendation stems off the assumption that whatever you're doing to one frame you are likely to do all three. In that case, you can use lapply
or purrr::map
on the list of frames, doing all frames in one step. See https://stackoverflow.com/a/24376207/3358227.
list_of_frames <- list(AB=data_frameAB, C=data_frameC, F=data_frameF)
list_of_frames2 <- purrr::map(
list_of_frames,
~ .x %>%
group_by(Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>% ungroup()
)
Again, without purrr
, that would be
list_of_frames2 <- lapply(
list_of_frames,
function(.x) group_by(.x, Year) %>%
mutate(ff1 = (c+d) / mapply(median c, d)) %>%
ungroup()
)
Combine them into one frame, preserving the original data. Starting with list_of_frames
,
bind_rows(list_of_frames, .id = "Frame") %>%
group_by(Frame, Year) %>%
mutate(ff1 = (c+d) / purrr::map2_dbl(c, d, median)) %>%
ungroup()
# # A tibble: 42 x 6
# Frame ID Year c d ff1
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AB 1 2011 10 12 2.2
# 2 AB 1 2012 11 13 2.18
# 3 AB 1 2013 12 14 2.17
# 4 AB 1 2014 13 15 2.15
# 5 AB 1 2015 14 16 2.14
# 6 AB 1 2016 15 34 3.27
# 7 AB 1 2017 16 25 2.56
# 8 AB 1 2018 17 26 2.53
# 9 AB 1 2019 18 56 4.11
# 10 AB 15 2015 23 38 2.65
# # ... with 32 more rows
Upvotes: 1