Dominic Oberle
Dominic Oberle

Reputation: 1

Adding a new column from values of rows below, with the row required alternating depending on a variable

I am new to R and having trouble trying to execute my plan.

I am trying to add another column to my data frame with the values coming from the rows below. However, each row must come from a different number of rows below.

I have 6-7 age cohorts and temperature values for multiple years. Cohort 1 will have that year's temperature value, cohort 2 will have the year prior, cohort 3 will have the temperature value 2 years prior, etc.

This is the data in a wide format. Wide format data. I will need the final output in a long format, such as long format

This is my current workings

    long_density %>% 
  mutate( Summer_prior_2 = Summer_max_prior) %>% 
  mutate(Summer_prior_2 = c(Summer_max_prior[-1]))
  select(Year, Cohort, Density, Summer_max_prior, Summer_prior_2, Winter_min_post, Summer_max_post) %>%
  View
  
long_density %>% 
  mutate( Summer_prior_2 = Summer_max_prior) %>% 
  mutate(Summer_prior_2 = ifelse("Cohort" == "Cohort_1",  Summer_prior_2,
                                   if_else("Cohort" == "Cohort_2",Summer_prior_2[-7], NA))) %>% 
  View

I was thinking a "ifelse" code where cohort_1 = Cohort_1, Cohort_2= Cohort_2 - 7, Cohort_3= Cohort_3 - 14, Cohort_4= Cohort_4 - 21, etc.. As there are 7 cohorts in in each year the value I want is increasing by 7 for each cohort.

EDIT

Sorry, my original question wasn't too clear.

I know how to pivot_longer. It's mutating the temperature data depending on the cohort that I am having difficulties with.

Here is an example data frame:

 df <- data.frame (Year<- as.numeric (c("2021","2020","2019","2018","2017")),
              Cohort_1 <- as.numeric (c("12", "13", "12", "14", "20")),
              Cohort_2 <- as.numeric (c("23", "22", "23", "26", "29")),
              Cohort_3 <- as.numeric (c("32", "32", "40", "35", "34")),
              Cohort_4 <- as.numeric (c("44", "43", "40", "49", "46")),
              Cohort_5 <- as.numeric (c("56", "49", "41", "50", "55")),
              Cohort_6 <- as.numeric (c("66", "61", "62", "69", "68")),
              Cohort_7 <- as.numeric (c("77", "90", "82", "84", "79")),
              Summer_max_prior <- as.numeric (c("2","3","4","6","4")),
              Winter_min_post <- as.numeric (c("1","2","0","3","1")))
> df
  Year Cohort_1 Cohort_2 Cohort_3 Cohort_4 Cohort_5 Cohort_6 Cohort_7 Summer_max Winter_min
1 2021       12       23       32       44       56       66       77          2          1
2 2020       13       22       32       43       49       61       90          3          2
3 2019       12       23       40       40       41       62       82          4          0
4 2018       14       26       35       49       50       69       84          6          3
5 2017       20       29       34       46       55       68       79          4          1

This is the pivot_longer code.

long_density <- pivot_longer(df,cols=c("Cohort_1","Cohort_2","Cohort_3","Cohort_4","Cohort_5","Cohort_6","Cohort_7"), names_to= "Cohort",
                 values_to= "Density")

> long_density %>% select(Year, Cohort, Density, Summer_max, Winter_min)

    Year Cohort   Density Summer_max Winter_min
   
 1  2021 Cohort_1      12          2          1
 2  2021 Cohort_2      23          2          1
 3  2021 Cohort_3      32          2          1
 4  2021 Cohort_4      44          2          1
 5  2021 Cohort_5      56          2          1
 6  2021 Cohort_6      66          2          1
 7  2021 Cohort_7      77          2          1
 8  2020 Cohort_1      13          3          2
 9  2020 Cohort_2      22          3          2
10  2020 Cohort_3      32          3          2

I need both Summer_max and Winter_min data for cohort 2 (2-year-olds) to be from the year before, cohort 3 (3-year-olds) to be 3 years before, and so on. I need the corresponding temperatures for the year that the cohort of fish spawned.

At the moment all values for each cohort within each year are the same. This is the code I have written so far but r does not like my ifelse code.

long_density %>% 
      mutate(Summer_prior_2 = Summer_max_prior) %>% 
      mutate(Summer_prior_2 = ifelse(long_density$Cohort == "Cohort_1",  Summer_max_prior,
                                       if_else(long_density$Cohort == "Cohort_2",Summer_max_prior[-7], 
                                                if_else(long_density$Cohort == "Cohort_3",Summer_max_prior[-14], 
                                                         if_else(long_density$Cohort == "Cohort_4",Summer_max_prior[-21], 
                                                                  if_else(long_density$Cohort == "Cohort_5",Summer_max_prior[-28], 
                                                                           if_else(long_density$Cohort == "Cohort_6",Summer_max_prior[-35], 
                                                                                    if_else(long_density$Cohort == "Cohort_7",Summer_max_prior[-42], NA)))))))) 

Any help would be greatly appreciated!

Upvotes: 0

Views: 545

Answers (1)

fbeese
fbeese

Reputation: 118

I hope I understand the question more clearly now. I added some comments to try to explain what the functions do:

Preprocessing:

library(tidyverse, warn.conflicts = FALSE)

df <- data.frame(Year = c(2021,2020,2019,2018,2017), # if you remove the quotes, R will automatically identify the data as numeric values
                 Cohort_1 = c(12, 13, 12, 14, 20),
                 Cohort_2 =  c(23, 22, 23, 26, 29),
                 Cohort_3 =  c(32, 32, 40, 35, 34),
                 Cohort_4 =  c(44, 43, 40, 49, 46),
                 Cohort_5 =  c(56, 49, 41, 50, 55),
                 Cohort_6 =  c(66, 61, 62, 69, 68),
                 Cohort_7 =  c(77, 90, 82, 84, 79),
                 Summer_max_prior =  c(2,3,4,6,4),
                 Winter_min_post =  c(1,2,0,3,1))

long_density <- df %>%
  pivot_longer(cols=starts_with("Cohort_"), # useful helper function (starts_with) to avoid using a long character vector as in your example; more helper functions here: https://dplyr.tidyverse.org/reference/select.html
               names_to = "Cohort",
               values_to = "Density") %>%
  select(Year, Cohort, Density, Summer_max_prior, Winter_min_post)

> long_density
# A tibble: 35 x 5
    Year Cohort   Density Summer_max_prior Winter_min_post
   <dbl> <chr>      <dbl>            <dbl>           <dbl>
 1  2021 Cohort_1      12                2               1
 2  2021 Cohort_2      23                2               1
 3  2021 Cohort_3      32                2               1
 4  2021 Cohort_4      44                2               1
 5  2021 Cohort_5      56                2               1
 6  2021 Cohort_6      66                2               1
 7  2021 Cohort_7      77                2               1
 8  2020 Cohort_1      13                3               2
 9  2020 Cohort_2      22                3               2
10  2020 Cohort_3      32                3               2
# ... with 25 more rows

One option is to generate a separate data.frame that contains the temperatures of each year and match it with the long_density data:

temperatures <- long_density %>%
  group_by(Year) %>% # group the data for year
  filter(row_number()==1) %>% # filter for the first row of each group (Year)
  select(!c(Density,Cohort)) # remove Density and Cohort

> temperatures
# A tibble: 5 x 3
# Groups:   Year [5]
   Year Summer_max_prior Winter_min_post
  <dbl>            <dbl>           <dbl>
1  2021                2               1
2  2020                3               2
3  2019                4               0
4  2018                6               3
5  2017                4               1

long_density2 <- long_density %>%
  mutate(Summer_prior_2 = case_when( # vectorized if-function: https://dplyr.tidyverse.org/reference/case_when.html
    Cohort == "Cohort_1" ~ temperatures$Summer_max_prior[1], # if Cohort is equal to Cohort_1 set the Summer_max_prior from the first row from the Temperatures data 
    Cohort == "Cohort_2" ~ temperatures$Summer_max_prior[2],
    Cohort == "Cohort_3" ~ temperatures$Summer_max_prior[3],
    Cohort == "Cohort_4" ~ temperatures$Summer_max_prior[4],
    Cohort == "Cohort_5" ~ temperatures$Summer_max_prior[5]
  ))

> long_density2[1:7,]
# A tibble: 7 x 6
   Year Cohort   Density Summer_max_prior Winter_min_post Summer_prior_2
  <dbl> <chr>      <dbl>            <dbl>           <dbl>          <dbl>
1  2021 Cohort_1      12                2               1              2
2  2021 Cohort_2      23                2               1              3
3  2021 Cohort_3      32                2               1              4
4  2021 Cohort_4      44                2               1              6
5  2021 Cohort_5      56                2               1              4
6  2021 Cohort_6      66                2               1             NA
7  2021 Cohort_7      77                2               1             NA

Another option: giving each cohort the desired temperature in the temperatures data and then merge with long_density:

Cohort_names <- data.frame(Cohort = paste0("Cohort_",c(1:nrow(df)))) # create a column which contains the cohort names depending on the number of rows of the wide data

temperatures <- long_density %>%
  group_by(Year) %>% filter(row_number()==1) %>%
  select(!c(Density,Cohort)) %>%
  cbind(., Cohort_names) # the dot refers to the piped data; here, we bind the colums from Cohort_names and temperatures

> temperatures
# A tibble: 5 x 4
# Groups:   Year [5]
   Year Summer_max_prior Winter_min_post Cohort  
  <dbl>            <dbl>           <dbl> <chr>   
1  2021                2               1 Cohort_1
2  2020                3               2 Cohort_2
3  2019                4               0 Cohort_3
4  2018                6               3 Cohort_4
5  2017                4               1 Cohort_5


long_density3 <- long_density %>% 
  rename(Summer_max_prior_old = Summer_max_prior, # rename the old columns 
         Winter_min_post_old = Winter_min_post) %>%
  merge(., temperatures, by="Cohort") %>% # merge with temperatures data by Cohort (the variable Cohort is present in both data long_density and temperatures)
  rename(Year = Year.x) %>% # just rename the Year variable as it appears with an x after merging
  arrange(desc(Year)) %>% # sort data by Year in descending order
  select(!c(Year.y,Summer_max_prior_old, Winter_min_post_old)) 

> long_density3[c(1:7),]
    Cohort Year Density Summer_max_prior Winter_min_post
1 Cohort_1 2021      12                2               1
2 Cohort_2 2021      23                3               2
3 Cohort_3 2021      32                4               0
4 Cohort_4 2021      44                6               3
5 Cohort_5 2021      56                4               1
6 Cohort_1 2020      13                2               1
7 Cohort_2 2020      22                3               2

I hope this a suitable solution and fits your desired output. If not please let me know.

Upvotes: 0

Related Questions