Reputation: 3
I am wondering how users on here would go about creating a new dichotomous variable in a dataframe based on whether a value in another variable falls above or below that variable's yearly average. I have tried checking for similar answers, but while I have uncovered the recommendation to use the aggregate function to generate the means on groups in a dataframe, that does not fully address my needs in this case.
Specifically I have a spatially lagged variable (already constructed), and I want to make a dichotomous variable that captures whether a state (id = COW) falls above or below the yearly average in my spatially lagged variable.
This is not the actual data I am working with, but a simplified version that should convey the structure of the data. Needless to say there are many other covariates and states in the actual dataframe. The Year variable is comprised of discrete intervals comprising all years from 1967-2018 (inclusive). In terms of additional information, the number of states is not equal across all years, as I have removed state entries prior to their formal entrance into (eg. South Sudan entering after 1967), or after their formal exit from the international state system (eg. Czechoslovakia):
COW Year SL_UN_ICCPR
2 1967 0
20 1967 0
31 1967 0
40 1967 0
...
2 1968 0
20 1968 1.2
31 1968 1.5
...
2 1980 4.6
20 1980 3.7
31 1980 3.0
...
900 2018 5.10
910 2018 2.6
920 2018 1.5
I want to produce output like this:
COW Year SL_UN_ICCPR Dichotomous
2 1967 0 0
20 1967 0 0
31 1967 0 0
40 1967 0 0
...
2 1968 0 0
20 1968 1.2 0
31 1968 1.5 1 #(assuming yearly mean = 1.4)
...
2 1980 4.6 1
20 1980 3.7 1
31 1980 3.0 0 #(assuming yearly mean = 3.1)
...
40 2018 5.10 1
42 2018 2.6 0 #(assuming yearly mean = 3.2)
51 2018 1.5 0
I've tried grouping the data by Year with group_by but the following code is not producing the desired result:
Data <- group_by(Data, Year)
Data <- mutate(Data, Spatial_Dummy_ICCPR = ifelse(SL_UN_ICCPR > mean(SL_UN_ICCPR) , 1, 0))
This produces a dichotomous variable without the desired grouping by year, instead mutating based on the overall variable mean. Can anyone give me some direction on where I am going wrong?
Upvotes: 0
Views: 254
Reputation: 73622
You could use base R's ave
to create a variable with yearly means on which you apply the ifelse
, conveniently in a within
.
d <- within(d, {
SL_UN_ICCPR.mean=ave(SL_UN_ICCPR, Year, FUN=mean)
Spatial_Dummy_ICCPR=ifelse(SL_UN_ICCPR > SL_UN_ICCPR.mean, 1, 0)
})
# COW Year SL_UN_ICCPR Spatial_Dummy_ICCPR SL_UN_ICCPR.mean
# 1 2 1967 0.0 0 0.000000
# 2 20 1967 0.0 0 0.000000
# 3 31 1967 0.0 0 0.000000
# 4 40 1967 0.0 0 0.000000
# 5 2 1968 0.0 0 0.900000
# 6 20 1968 1.2 1 0.900000
# 7 31 1968 1.5 1 0.900000
# 8 2 1980 4.6 1 3.766667
# 9 20 1980 3.7 0 3.766667
# 10 31 1980 3.0 0 3.766667
# 11 900 2018 5.1 1 3.066667
# 12 910 2018 2.6 0 3.066667
# 13 920 2018 1.5 0 3.066667
d <- structure(list(COW = c(2L, 20L, 31L, 40L, 2L, 20L, 31L, 2L, 20L,
31L, 900L, 910L, 920L), Year = c(1967L, 1967L, 1967L, 1967L,
1968L, 1968L, 1968L, 1980L, 1980L, 1980L, 2018L, 2018L, 2018L
), SL_UN_ICCPR = c(0, 0, 0, 0, 0, 1.2, 1.5, 4.6, 3.7, 3, 5.1,
2.6, 1.5)), row.names = c(NA, -13L), class = "data.frame")
Upvotes: 0
Reputation: 283
You need to first create the average by year, then ungroup and finally create your dummy. Something like this should work:
library(tidyverse)
Data %>%
group_by(Year)%>%
mutate(avg_year = mean(SL_UN_ICCPR))%>%
ungroup()%>%
mutate(Spatial_Dummy_ICCPR = ifelse(SL_UN_ICCPR > avg_year , 1, 0))
Upvotes: 0