user6780841
user6780841

Reputation: 33

For rows with duplicates, create new column with conditional value

Some sample data:

x <- data.frame(c(1992, 1992, 1992, 1994, 1994, 1995, 1992, 1992, 1993), c("Taliban", "Taliban", "Taliban", "Taliban", "Taliban", "Taliban", "Afghanistan", "Afghanistan", "Afghanistan"), c(300, 300, 300, 100, 100, 250, 25, 25, 60))
colnames(x) <- c("year", "actor", "deaths")
x$year <- as.integer(x$year) # this is to match the class of my actual data

My goal is to create and populate a new column "even_deaths" with a value based on the following conditions: if more than one row where the year and actor match, then "even_deaths" will be the number of "deaths" divided by the number of duplicate rows.

In short, I want the new dataframe to look like this:

year          actor          deaths          even_deaths
1992          Taliban        300             100
1992          Taliban        300             100
1992          Taliban        300             100
1994          Taliban        100             50
1994          Taliban        100             50
1995          Taliban        250             250
1992          Afghanistan    25              12.5
1992          Afghanistan    25              12.5
1993          Afghanistan    60              60

The dataset is particularly large with with over 1k actors so I'm hoping wouldn't need to specify each individual one. Also, ideally I could perform whatever action on just the rows that have duplicates (as opposed to just duplicate and unique rows). Any help is very much appreciated and I apologize if the wording is vague.

Cheers,

Ardeshir

Upvotes: 0

Views: 98

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

You can use base R function ave

x$even_deaths <- ave(x$deaths, x$year, x$actor, FUN = function(x) x/length(x))
x

#  year       actor deaths even_deaths
#1 1992     Taliban    300       100.0
#2 1992     Taliban    300       100.0
#3 1992     Taliban    300       100.0
#4 1994     Taliban    100        50.0
#5 1994     Taliban    100        50.0
#6 1995     Taliban    250       250.0
#7 1992 Afghanistan     25        12.5
#8 1992 Afghanistan     25        12.5
#9 1993 Afghanistan     60        60.0

Upvotes: 1

Related Questions