Milhouse
Milhouse

Reputation: 177

Dealing with ties using rank (R)

I'm trying to create dummy variable for whether a child is first born, and one for if the child is second born. My data looks something like this

ID   MID   CMOB   CYRB      
1    1     1      1991
2    1     7      1989
3    2     1      1985
4    2     11     1985
5    2     9      1994
6    3     4      1992
7    4     2      1992
8    4     10     1983

With ID = child ID, MID = mother ID, CMOB = month of birth and CYRB = year of birth.

For the first born dummy I tried using this:

Identifiers_age <- Identifiers_age %>% group_by(MPUBID) 
                          %>% mutate(first = as.numeric(rank(CYRB) == 1))

But there doesn't seem to be a way of breaking ties by the rank of another columnn (clearly in this case the desired column being CMOB), whenever I try using the "ties.method" argument it tell me the input must be a character vector.

Am I missing something here?

Upvotes: 4

Views: 1125

Answers (2)

akrun
akrun

Reputation: 887991

If we still want to use rank, we can convert the 'CYRB', 'CMOB' in to 'Date', apply rank on it and the get the binary output based on the logical vector

Identifiers_age %>%
         group_by(MID) %>% 
         mutate(first = as.integer(rank(as.Date(paste(CYRB, CMOB, 1,
                  sep="-"), "%Y-%m-%d"))==1))
#     ID   MID  CMOB  CYRB first
#  <int> <int> <int> <int> <int>
#1     1     1     1  1991     0
#2     2     1     7  1989     1
#3     3     2     1  1985     1
#4     4     2    11  1985     0
#5     5     2     9  1994     0
#6     6     3     4  1992     1
#7     7     4     2  1992     0
#8     8     4    10  1983     1

Or we can use arithmetic to do this with rank

Identifiers_age %>% 
         group_by(MID) %>%
         mutate(first = as.integer(rank(CYRB + CMOB/12)==1))
#     ID   MID  CMOB  CYRB first
#   <int> <int> <int> <int> <int>
#1     1     1     1  1991     0
#2     2     1     7  1989     1
#3     3     2     1  1985     1
#4     4     2    11  1985     0
#5     5     2     9  1994     0
#6     6     3     4  1992     1
#7     7     4     2  1992     0
#8     8     4    10  1983     1

Upvotes: 1

akuiper
akuiper

Reputation: 215137

order might be more convenient to use here, from ?order:

order returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments.

Identifiers_age <- Identifiers_age %>% group_by(MID) %>% 
                   mutate(first = as.numeric(order(CYRB, CMOB) == 1))
Identifiers_age

#Source: local data frame [8 x 5]
#Groups: MID [4]

#     ID   MID  CMOB  CYRB first
#  <int> <int> <int> <int> <dbl>
#1     1     1     1  1991     0
#2     2     1     7  1989     1
#3     3     2     1  1985     1
#4     4     2    11  1985     0
#5     5     2     9  1994     0
#6     6     3     4  1992     1
#7     7     4     2  1992     0
#8     8     4    10  1983     1

Upvotes: 4

Related Questions