Reputation: 861
I have a data set like this:
sum_col city scen model time_period chill_season
110.02 NY RCP_8 bcc 2076_2099 season_2085_2086
91.26 NY RCP_8 bcc 2076_2099 season_2086_2087
91.05 NY RCP_8 bcc 2076_2099 season_2087_2088
74.96 NY RCP_8 bcc 2076_2099 season_2088_2089
77.97 NY RCP_8 bcc 2076_2099 season_2089_2090
109.05 NY RCP_8 bcc 2076_2099 season_2090_2091
I want to cut
the sum_col
column and count how many times, the values fall
within each interval bks = c(-300, seq(20, 75, 5), 300)
.
However, when I try the following:
result <- dt %>%
mutate(thresh_range = cut(sum_col, breaks = bks)) %>%
group_by(time_period, thresh_range, model, scen, city) %>%
summarize(no_years = n_distinct(chill_season, na.rm = FALSE)) %>%
data.table()
my result looks like:
time_period thresh_range model scen city no_years
2076_2099 (70,75] bcc RCP_8 NY 1
2076_2099 (75,300] bcc RCP_8 NY 5
So, the intervals that are less than 70
, e.g. (20, 25), (25, 30)
, are
not created (because there is no row in data that falls within those intervals).
Is there anyway to tell the cut
, to return zero for those intervals?
Please note, again, that a row similar to the following:
a_value_leass_than_70_here NY RCP_8 bcc 2076_2099 chill_2076_2077
whose corresponding sum_col
is less than 70 does not exist in the data, however, I was wondering if it is possible for such a non-existing data, cut
can create a 0
or NA
that tells us the temperature of NY, with those parameters indeed did not fall in (20, 25)
interval.
The bottom line is that I want to see how many years, each city with a given set of parameters (model, scen, etc)
falls within each interval, (20, 25), (25,30), etc.
,
If any suggestion other that cut
works, that is great as well.
Upvotes: 1
Views: 293
Reputation: 13108
You can use the complete
function from the tidyr
package to create NA
rows for missing combinations of data:
library(tidyr)
result <- dt %>%
mutate(thresh_range = cut(sum_col, breaks = bks)) %>%
complete(time_period, thresh_range, model, scen, city) %>%
group_by(time_period, thresh_range, model, scen, city) %>%
summarize(no_years = n_distinct(chill_season, na.rm = TRUE))
result
# # A tibble: 13 x 6
# # Groups: time_period, thresh_range, model, scen [?]
# time_period thresh_range model scen city no_years
# <chr> <fct> <chr> <chr> <chr> <int>
# 1 2076_2099 (-300,20] bcc RCP_8 NY 0
# 2 2076_2099 (20,25] bcc RCP_8 NY 0
# 3 2076_2099 (25,30] bcc RCP_8 NY 0
# 4 2076_2099 (30,35] bcc RCP_8 NY 0
# 5 2076_2099 (35,40] bcc RCP_8 NY 0
# 6 2076_2099 (40,45] bcc RCP_8 NY 0
# 7 2076_2099 (45,50] bcc RCP_8 NY 0
# 8 2076_2099 (50,55] bcc RCP_8 NY 0
# 9 2076_2099 (55,60] bcc RCP_8 NY 0
# 10 2076_2099 (60,65] bcc RCP_8 NY 0
# 11 2076_2099 (65,70] bcc RCP_8 NY 0
# 12 2076_2099 (70,75] bcc RCP_8 NY 1
# 13 2076_2099 (75,300] bcc RCP_8 NY 5
Upvotes: 2