Reputation: 33
Some sample data:
x <- data.frame(c(1992, 1992, 1992, 1994, 1994, 1995, 1992, 1992, 1993), c("Taliban", "Taliban", "Taliban", "Taliban", "Taliban", "Taliban", "Afghanistan", "Afghanistan", "Afghanistan"), c(300, 300, 300, 100, 100, 250, 25, 25, 60))
colnames(x) <- c("year", "actor", "deaths")
x$year <- as.integer(x$year) # this is to match the class of my actual data
My goal is to create and populate a new column "even_deaths" with a value based on the following conditions: if more than one row where the year and actor match, then "even_deaths" will be the number of "deaths" divided by the number of duplicate rows.
In short, I want the new dataframe to look like this:
year actor deaths even_deaths
1992 Taliban 300 100
1992 Taliban 300 100
1992 Taliban 300 100
1994 Taliban 100 50
1994 Taliban 100 50
1995 Taliban 250 250
1992 Afghanistan 25 12.5
1992 Afghanistan 25 12.5
1993 Afghanistan 60 60
The dataset is particularly large with with over 1k actors so I'm hoping wouldn't need to specify each individual one. Also, ideally I could perform whatever action on just the rows that have duplicates (as opposed to just duplicate and unique rows). Any help is very much appreciated and I apologize if the wording is vague.
Cheers,
Ardeshir
Upvotes: 0
Views: 98
Reputation: 388907
You can use base R
function ave
x$even_deaths <- ave(x$deaths, x$year, x$actor, FUN = function(x) x/length(x))
x
# year actor deaths even_deaths
#1 1992 Taliban 300 100.0
#2 1992 Taliban 300 100.0
#3 1992 Taliban 300 100.0
#4 1994 Taliban 100 50.0
#5 1994 Taliban 100 50.0
#6 1995 Taliban 250 250.0
#7 1992 Afghanistan 25 12.5
#8 1992 Afghanistan 25 12.5
#9 1993 Afghanistan 60 60.0
Upvotes: 1