Reputation: 137
I have a specific code that I want to write in R that I couldn't find an answer to on Stack Overflow. I am manipulating a dataset of continents data and am looking to calculate cumulative values for each year. This is a snapshot of what the df looks like:
Continent Year Value Cumulative Value
<chr> <dbl> <dbl> <dbl>
1 Europe 2000. 10. 10.
2 Asia 2000. 30. 30.
3 Africa 2000. 67. 67.
4 N. America 2000. 23. 23.
5 S. America 2000. 19. 19.
6 Europe 2001. 3. 13.
7 Asia 2001. 4. 34.
8 Africa 2001. 3. 70.
9 Europe 2002. 3. 16.
10 Asia 2002. 9. 43.
11 Africa 2002. 2. 72.
12 N. America 2002. 4. 27.
13 S. America 2002. 90. 109.
My issue is that not every continent has a value every year, yet I still need the cumulative value for that year. The cumulative value for that year would be the same for that specific continent as the previous year.
For example, in 2001, N. America and S. America do not have a row, and I would like both to show up with value = 0 and cumulative value as 23 and 19, respectively, the same as the previous year (in year 2000). I am unsure what code would accomplish this so any advice would be greatly appreciated.
Continent Year Value Cumulative Value
N. America 2001. 0. 23.
S. America 2001. 0. 19.
Let me know if I should provide more information and thanks again!
data
structure(list(Continent = c("Europe", "Asia", "Africa", "N. America",
"S. America", "Europe", "Asia", "Africa", "Europe", "Asia", "Africa",
"N. America", "S. America"), Year = c(2000, 2000, 2000, 2000,
2000, 2001, 2001, 2001, 2002, 2002, 2002, 2002, 2002), Value = c(10,
30, 67, 23, 19, 3, 4, 3, 3, 9, 2, 4, 90), `Cumulative Value` = c(10,
30, 67, 23, 19, 13, 34, 70, 16, 43, 72, 27, 109)), .Names = c("Continent",
"Year", "Value", "Cumulative Value"), row.names = c(NA, -13L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 0
Views: 70
Reputation: 10432
Here's a tidyverse
option:
library(tidyverse)
df %>%
complete(Continent, Year) %>%
replace_na(list(Value = 0)) %>%
fill(Cumulative)
# A tibble: 15 x 4
Continent Year Value Cumulative
<chr> <int> <dbl> <int>
1 Africa 2000 67 67
2 Africa 2001 3 70
3 Africa 2002 2 72
4 Asia 2000 30 30
5 Asia 2001 4 34
6 Asia 2002 9 43
7 Europe 2000 10 10
8 Europe 2001 3 13
9 Europe 2002 3 16
10 N. America 2000 23 23
11 N. America 2001 0 23
12 N. America 2002 4 27
13 S. America 2000 19 19
14 S. America 2001 0 19
15 S. America 2002 90 109
Upvotes: 1
Reputation: 146129
This should work, but is untested since your data isn't shared in a copy/pasteable way. Share dput(your_sample_data)
and I will test/debug.
library(dplyr)
library(tidyr)
complete(your_data, Continent, Year, fill = list(Value = 0)) %>%
group_by(Continent) %>%
mutate(`Cumulative Value` = zoo::na.locf(`Cumulative Value`))
# A tibble: 15 x 4
# Groups: Continent [5]
Continent Year Value CV
<chr> <dbl> <dbl> <dbl>
1 Africa 2000 67 67
2 Africa 2001 3 70
3 Africa 2002 2 72
4 Asia 2000 30 30
5 Asia 2001 4 34
6 Asia 2002 9 43
7 Europe 2000 10 10
8 Europe 2001 3 13
9 Europe 2002 3 16
10 N. America 2000 23 23
11 N. America 2001 0 23
12 N. America 2002 4 27
13 S. America 2000 19 19
14 S. America 2001 0 19
15 S. America 2002 90 109
Upvotes: 2