Mackendrick
Mackendrick

Reputation: 15

Creating a column with differences based on another column

I have a data frame that looks like this (simplified from 699 treaties):

TRT <- data.frame(T.ID=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8),
              Treaty=c("hungary slovenia 1994", "hungary slovenia 1994",
                                "nicaragua taiwan 2006", "nicaragua taiwan 2006",
                                "ukraine uzbekistan 1994", "ukraine uzbekistan 1994",
                                "brazil uruguay 1986", "brazil uruguay 1986",
                                "albania macedonia 2002", "albania macedonia 2002",
                                "albania moldova 2003", "albania moldova 2003",
                                "albania romania 2003", "albania romania 2003",
                                "Treaty of Izmir 1977","Treaty of Izmir 1977",
                                "Treaty of Izmir 1977"),
              sc.y=c("HUN1994", "SLV1994", "NIC2006", "TAW2006", "UKR1994", 
                     "UZB1994", "BRA1986", "URU1986", "ALB2002", "MAC2002", 
                     "ALB2003", "MLD2003", "ALB2003", "RUM2003", "IRN1977", 
                     "TUR1977", "PAK1977"),
              prom.demo=c(1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0),
              polity=c(10,10,8,10,7,-9,7,9,7,9,7,8,7,8,-10,-7,9))

In the end, I want to have a data frame that lists each treaty only once, its value of the “prom.demo”-column and one column that contains the difference of the maximum and minimum among the “polity”-values of the contracting parties of each treaty (most treaties have only two contracting parties, but some have up to 51). Is there any R command that spares me 699 calculations?

Upvotes: 0

Views: 105

Answers (1)

Spacedman
Spacedman

Reputation: 94192

Using dplyr its a join on scode and year followed by grouping by Treaty and then working out the difference between the min and max polity:

require(dplyr)
left_join(treaties, Polity, c("scode","year")) %>% group_by(Treaty) %>% summarise(PolityDiff=max(polity,na.rm=TRUE)-min(polity,na.rm=TRUE))
Source: local data frame [8 x 2]

                   Treaty PolityDiff
1  albania macedonia 2002          2
2    albania moldova 2003          1
3    albania romania 2003          1
4     brazil uruguay 1986          2
5   hungary slovenia 1994          0
6   nicaragua taiwan 2006          2
7    Treaty of Izmir 1977         NA
8 ukraine uzbekistan 1994         16

The NA's are where you don't have any matching scode/year (The Treaty of Izmir is IRN/TUR/PAK in 1977, and none of those are in the Polity data).

Note that if you want NA if any one of the participating countries are not in the Polity data, use:

left_join(treaties, Polity, c("scode","year")) %>% group_by(Treaty) %>% summarise(PolityDiff=max(polity)-min(polity))

which gives:

                   Treaty PolityDiff
1  albania macedonia 2002          2
2    albania moldova 2003          1
3    albania romania 2003          1
4     brazil uruguay 1986          2
5   hungary slovenia 1994         NA
6   nicaragua taiwan 2006          2
7    Treaty of Izmir 1977         NA
8 ukraine uzbekistan 1994         16

because Slovenia is coded as SLV in Polity but there's SLO in the treaties - mistake? Anyway, there's no SLO/1994 in Polity so that treaty returns as NA in this variant. It returns zero in my first example because the NA gets dropped and the polity difference is the difference between one number and itself...

Upvotes: 1

Related Questions