Reputation: 1
I am new to R and having trouble trying to execute my plan.
I am trying to add another column to my data frame with the values coming from the rows below. However, each row must come from a different number of rows below.
I have 6-7 age cohorts and temperature values for multiple years. Cohort 1 will have that year's temperature value, cohort 2 will have the year prior, cohort 3 will have the temperature value 2 years prior, etc.
This is the data in a wide format. Wide format data. I will need the final output in a long format, such as long format
This is my current workings
long_density %>%
mutate( Summer_prior_2 = Summer_max_prior) %>%
mutate(Summer_prior_2 = c(Summer_max_prior[-1]))
select(Year, Cohort, Density, Summer_max_prior, Summer_prior_2, Winter_min_post, Summer_max_post) %>%
View
long_density %>%
mutate( Summer_prior_2 = Summer_max_prior) %>%
mutate(Summer_prior_2 = ifelse("Cohort" == "Cohort_1", Summer_prior_2,
if_else("Cohort" == "Cohort_2",Summer_prior_2[-7], NA))) %>%
View
I was thinking a "ifelse" code where cohort_1 = Cohort_1, Cohort_2= Cohort_2 - 7, Cohort_3= Cohort_3 - 14, Cohort_4= Cohort_4 - 21, etc.. As there are 7 cohorts in in each year the value I want is increasing by 7 for each cohort.
EDIT
Sorry, my original question wasn't too clear.
I know how to pivot_longer. It's mutating the temperature data depending on the cohort that I am having difficulties with.
Here is an example data frame:
df <- data.frame (Year<- as.numeric (c("2021","2020","2019","2018","2017")),
Cohort_1 <- as.numeric (c("12", "13", "12", "14", "20")),
Cohort_2 <- as.numeric (c("23", "22", "23", "26", "29")),
Cohort_3 <- as.numeric (c("32", "32", "40", "35", "34")),
Cohort_4 <- as.numeric (c("44", "43", "40", "49", "46")),
Cohort_5 <- as.numeric (c("56", "49", "41", "50", "55")),
Cohort_6 <- as.numeric (c("66", "61", "62", "69", "68")),
Cohort_7 <- as.numeric (c("77", "90", "82", "84", "79")),
Summer_max_prior <- as.numeric (c("2","3","4","6","4")),
Winter_min_post <- as.numeric (c("1","2","0","3","1")))
> df
Year Cohort_1 Cohort_2 Cohort_3 Cohort_4 Cohort_5 Cohort_6 Cohort_7 Summer_max Winter_min
1 2021 12 23 32 44 56 66 77 2 1
2 2020 13 22 32 43 49 61 90 3 2
3 2019 12 23 40 40 41 62 82 4 0
4 2018 14 26 35 49 50 69 84 6 3
5 2017 20 29 34 46 55 68 79 4 1
This is the pivot_longer code.
long_density <- pivot_longer(df,cols=c("Cohort_1","Cohort_2","Cohort_3","Cohort_4","Cohort_5","Cohort_6","Cohort_7"), names_to= "Cohort",
values_to= "Density")
> long_density %>% select(Year, Cohort, Density, Summer_max, Winter_min)
Year Cohort Density Summer_max Winter_min
1 2021 Cohort_1 12 2 1
2 2021 Cohort_2 23 2 1
3 2021 Cohort_3 32 2 1
4 2021 Cohort_4 44 2 1
5 2021 Cohort_5 56 2 1
6 2021 Cohort_6 66 2 1
7 2021 Cohort_7 77 2 1
8 2020 Cohort_1 13 3 2
9 2020 Cohort_2 22 3 2
10 2020 Cohort_3 32 3 2
I need both Summer_max and Winter_min data for cohort 2 (2-year-olds) to be from the year before, cohort 3 (3-year-olds) to be 3 years before, and so on. I need the corresponding temperatures for the year that the cohort of fish spawned.
At the moment all values for each cohort within each year are the same. This is the code I have written so far but r does not like my ifelse code.
long_density %>%
mutate(Summer_prior_2 = Summer_max_prior) %>%
mutate(Summer_prior_2 = ifelse(long_density$Cohort == "Cohort_1", Summer_max_prior,
if_else(long_density$Cohort == "Cohort_2",Summer_max_prior[-7],
if_else(long_density$Cohort == "Cohort_3",Summer_max_prior[-14],
if_else(long_density$Cohort == "Cohort_4",Summer_max_prior[-21],
if_else(long_density$Cohort == "Cohort_5",Summer_max_prior[-28],
if_else(long_density$Cohort == "Cohort_6",Summer_max_prior[-35],
if_else(long_density$Cohort == "Cohort_7",Summer_max_prior[-42], NA))))))))
Any help would be greatly appreciated!
Upvotes: 0
Views: 545
Reputation: 118
I hope I understand the question more clearly now. I added some comments to try to explain what the functions do:
Preprocessing:
library(tidyverse, warn.conflicts = FALSE)
df <- data.frame(Year = c(2021,2020,2019,2018,2017), # if you remove the quotes, R will automatically identify the data as numeric values
Cohort_1 = c(12, 13, 12, 14, 20),
Cohort_2 = c(23, 22, 23, 26, 29),
Cohort_3 = c(32, 32, 40, 35, 34),
Cohort_4 = c(44, 43, 40, 49, 46),
Cohort_5 = c(56, 49, 41, 50, 55),
Cohort_6 = c(66, 61, 62, 69, 68),
Cohort_7 = c(77, 90, 82, 84, 79),
Summer_max_prior = c(2,3,4,6,4),
Winter_min_post = c(1,2,0,3,1))
long_density <- df %>%
pivot_longer(cols=starts_with("Cohort_"), # useful helper function (starts_with) to avoid using a long character vector as in your example; more helper functions here: https://dplyr.tidyverse.org/reference/select.html
names_to = "Cohort",
values_to = "Density") %>%
select(Year, Cohort, Density, Summer_max_prior, Winter_min_post)
> long_density
# A tibble: 35 x 5
Year Cohort Density Summer_max_prior Winter_min_post
<dbl> <chr> <dbl> <dbl> <dbl>
1 2021 Cohort_1 12 2 1
2 2021 Cohort_2 23 2 1
3 2021 Cohort_3 32 2 1
4 2021 Cohort_4 44 2 1
5 2021 Cohort_5 56 2 1
6 2021 Cohort_6 66 2 1
7 2021 Cohort_7 77 2 1
8 2020 Cohort_1 13 3 2
9 2020 Cohort_2 22 3 2
10 2020 Cohort_3 32 3 2
# ... with 25 more rows
One option is to generate a separate data.frame that contains the temperatures of each year and match it with the long_density
data:
temperatures <- long_density %>%
group_by(Year) %>% # group the data for year
filter(row_number()==1) %>% # filter for the first row of each group (Year)
select(!c(Density,Cohort)) # remove Density and Cohort
> temperatures
# A tibble: 5 x 3
# Groups: Year [5]
Year Summer_max_prior Winter_min_post
<dbl> <dbl> <dbl>
1 2021 2 1
2 2020 3 2
3 2019 4 0
4 2018 6 3
5 2017 4 1
long_density2 <- long_density %>%
mutate(Summer_prior_2 = case_when( # vectorized if-function: https://dplyr.tidyverse.org/reference/case_when.html
Cohort == "Cohort_1" ~ temperatures$Summer_max_prior[1], # if Cohort is equal to Cohort_1 set the Summer_max_prior from the first row from the Temperatures data
Cohort == "Cohort_2" ~ temperatures$Summer_max_prior[2],
Cohort == "Cohort_3" ~ temperatures$Summer_max_prior[3],
Cohort == "Cohort_4" ~ temperatures$Summer_max_prior[4],
Cohort == "Cohort_5" ~ temperatures$Summer_max_prior[5]
))
> long_density2[1:7,]
# A tibble: 7 x 6
Year Cohort Density Summer_max_prior Winter_min_post Summer_prior_2
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 2021 Cohort_1 12 2 1 2
2 2021 Cohort_2 23 2 1 3
3 2021 Cohort_3 32 2 1 4
4 2021 Cohort_4 44 2 1 6
5 2021 Cohort_5 56 2 1 4
6 2021 Cohort_6 66 2 1 NA
7 2021 Cohort_7 77 2 1 NA
Another option: giving each cohort the desired temperature in the temperatures data and then merge with long_density
:
Cohort_names <- data.frame(Cohort = paste0("Cohort_",c(1:nrow(df)))) # create a column which contains the cohort names depending on the number of rows of the wide data
temperatures <- long_density %>%
group_by(Year) %>% filter(row_number()==1) %>%
select(!c(Density,Cohort)) %>%
cbind(., Cohort_names) # the dot refers to the piped data; here, we bind the colums from Cohort_names and temperatures
> temperatures
# A tibble: 5 x 4
# Groups: Year [5]
Year Summer_max_prior Winter_min_post Cohort
<dbl> <dbl> <dbl> <chr>
1 2021 2 1 Cohort_1
2 2020 3 2 Cohort_2
3 2019 4 0 Cohort_3
4 2018 6 3 Cohort_4
5 2017 4 1 Cohort_5
long_density3 <- long_density %>%
rename(Summer_max_prior_old = Summer_max_prior, # rename the old columns
Winter_min_post_old = Winter_min_post) %>%
merge(., temperatures, by="Cohort") %>% # merge with temperatures data by Cohort (the variable Cohort is present in both data long_density and temperatures)
rename(Year = Year.x) %>% # just rename the Year variable as it appears with an x after merging
arrange(desc(Year)) %>% # sort data by Year in descending order
select(!c(Year.y,Summer_max_prior_old, Winter_min_post_old))
> long_density3[c(1:7),]
Cohort Year Density Summer_max_prior Winter_min_post
1 Cohort_1 2021 12 2 1
2 Cohort_2 2021 23 3 2
3 Cohort_3 2021 32 4 0
4 Cohort_4 2021 44 6 3
5 Cohort_5 2021 56 4 1
6 Cohort_1 2020 13 2 1
7 Cohort_2 2020 22 3 2
I hope this a suitable solution and fits your desired output. If not please let me know.
Upvotes: 0