shan
shan

Reputation: 583

Merging two dataframes, removing duplicates and aggregation in R

I have two dataframes in R named house and candidates.

house

      House       Region                 Military_Strength
1 Stark           The North              20000
2 Targaryen       Slaver's Bay           110000
3 Lannister       The Westerlands        60000
4 Baratheon       The Stormlands         40000
5 Tyrell          The Reach              30000


candidates

  House               Name                  Region
1 Lannister           Jamie Lannister       Westros
2 Stark               Robb Stark            North
3 Stark               Arya Stark            Westros
4 Lannister           Cersi Lannister       Westros
5 Targaryen           Daenerys Targaryen    Mereene
6 Baratheon           Robert Baratheon      Westros
7 Mormont             Jorah Mormont         Mereene

I want to merge the two dataframes on the basis of house. For that I have done:

merge(candidates, house, by="House", sort=FALSE)

The output is :

       House        Name         Region.x        Region.y   Military_Strength
 1 Lannister    Jamie Lannister  Westros     The Westerlands             60000
 2 Lannister    Cersi Lannister  Westros     The Westerlands             60000
 3 Stark         Robb Stark      North       The North                   20000
 4 Stark         Arya Stark      Westros     The North                   20000
 5 Targaryen Daenerys Targaryen  Mereene     Slaver's Bay                110000
 6 Baratheon   Robert Baratheon  Westros     The Stormlands              40000

I want to remove the second Name candidate from every house(if any), but its Military_Strength should be added up to the first candidate of the same house.

for eg:

4 Stark         Arya Stark      Westros     The North                   20000

would be removed but, 20000 would be added up to row3 Robb Stark Military_Strength. How to do it in appropriate way?

Upvotes: 1

Views: 109

Answers (1)

RHertel
RHertel

Reputation: 23788

Starting from the data.frame df1 obtained after merge(), one could proceed with:

df1$Military_Strength <- with(df1, ave(Military_Strength, House, FUN=sum))
df1[!duplicated(df1$House),]
#      House               Name Region.x        Region.y Military_Strength
#1 Lannister    Jamie Lannister  Westros The Westerlands            120000
#3     Stark         Robb Stark    North       The North             40000
#5 Targaryen Daenerys Targaryen  Mereene    Slaver's Bay            110000
#6 Baratheon   Robert Baratheon  Westros  The Stormlands             40000

data used in this example:

df1 <- structure(list(House = structure(c(2L, 2L, 3L, 3L, 4L, 1L), 
                .Label = c("Baratheon", "Lannister", "Stark", "Targaryen"), 
                class = "factor"), Name = structure(c(4L, 2L, 5L, 1L, 3L, 6L), 
                .Label = c("Arya Stark", "Cersi Lannister", "Daenerys Targaryen", 
                "Jamie Lannister", "Robb Stark", "Robert Baratheon"), 
                class = "factor"), Region.x = structure(c(3L, 3L, 2L, 3L, 1L, 3L), 
                .Label = c("Mereene", "North", "Westros"), class = "factor"), 
                Region.y = structure(c(4L, 4L, 2L, 2L, 1L, 3L), 
                .Label = c("Slaver's Bay", "The North", "The Stormlands",
                  "The Westerlands"), class = "factor"), 
                Military_Strength = c(60000L, 60000L, 20000L, 20000L, 110000L, 
                40000L)), .Names = c("House", "Name", "Region.x", "Region.y", 
                "Military_Strength"), class = "data.frame", row.names = c("1", 
                "2", "3", "4", "5", "6"))

Upvotes: 1

Related Questions