Reputation: 21
I would like to create a new variable in a dataframe in R, essentially using the equivalent of a "sumif" function in excel.
I have a dataframe which looks like this (but is much larger):
Country Year Number of IMF programs during the year
AFG 2000 1
ARG 2000 1
AFG 2001 0
ARG 2001 1
I would like to create a new variable "number of IMF programs over the period": the sum of the number of programs in the country, over the period.
In the example, we can see from the column "IMF programs during the year" that Afghanistan had 1 program over the period (1 in 2000, 0 in 2001), and Argentina had 2. so the table would look like this:
Country Year IMF programs during the year N of programs over the period
BRA 2000 1 1
ARG 2000 1 2
BRA 2001 0 1
ARG 2001 1 2
I have seen tutorials, but they only show how to create subsets of observations matching certain criteria, and then calculate sums within the subsets. I would be able to do this, but it is not what I need. I would need another variable...
The intuition behind the function would be: sum "number of programs in the year" if ("country" = "the country that corresponds to this row").
Would you be able to provide me with some guidance ?
Many thanks for your help !
Upvotes: 2
Views: 243
Reputation: 887128
We can use aggregate
from base R
aggregate(IMFPrograms ~ country, df, FUN = sum)
If we need to create a column, use ave
df$NPrograms <- with(df, ave(IMFPrograms, country, FUN = sum))
Or using rowsum
rowsum(df$IMFPrograms, df$country)
Upvotes: 1
Reputation: 21274
library(tidyverse)
want <- df %>%
group_by(country) %>%
mutate(NPrograms = sum(IMFPrograms))
Upvotes: 3