user7045
user7045

Reputation: 41

Selecting a IDs with specified column attributes

The data in which I am trying to make my selection looks like

   ID Field  Rank
8    6 Other  Prof
9    6 Other  Prof
13   7 Other Assoc
16   7 Other Assoc
17   7 Other  Prof
18   8 Other Assoc
19   8 Other Assoc
22   9 Other Assoc
23   9 Other Assoc
24   9 Other  Prof

I am trying to create a new variable that contains all the rows of the people(ID) that have been promoted from 'Assoc' to 'Prof'. For example I would like my new variable to look like

   ID Field  Rank
13   7 Other Assoc
16   7 Other Assoc
17   7 Other  Prof
22   9 Other Assoc
23   9 Other Assoc
24   9 Other  Prof

I have tried the subset function but with no luck.

Is there a function in R that can do this? If not, how can this be achieved.

EDIT: here, is the result from dput(). Note I left out the "Field" variable since it does not contain any information in this example.

 df.promotion <- structure(list(id = c(6, 6, 7, 7, 7, 8, 8, 9, 9, 9), rank = structure(c(2L, 
 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("Assoc", "Prof"
 ), class = "factor")), .Names = c("id", "rank"), row.names = c(NA, 
 -10L), class = "data.frame")

Upvotes: 4

Views: 199

Answers (4)

Ramnath
Ramnath

Reputation: 55695

Here is the customary one liner using plyr. The code works by (a) splitting the data frame by id, and (b) choosing only those subsets with greater than 1 unique rank (which is a proxy for promotion)

require(plyr)
ddply(df.promotion, .(id), subset, length(unique(rank)) > 1)

Upvotes: 1

JD Long
JD Long

Reputation: 60756

Here's a fairly easy to understand approach that uses your first inclination to do this using subset():

I create p which is the id of everyone who's a Prof. Then I create a which is everyone who's an associate. Then using %in% we select everyone who's been both an Assoc and a Prof. That gives me a set of keys which I can then use to subset the initial data.frame.

p <- unique(subset(df.promotion, rank=="Prof")$id)
a <- unique(subset(df.promotion, rank=="Assoc")$id)

mySet <- a[a %in% p]
subset(df.promotion, id %in% mySet)

Upvotes: 2

Nick Sabbe
Nick Sabbe

Reputation: 11956

Let's do this with base R (though plyr is beckoning): Edit adapted + tested to the newly provided dput output...

dfr<-df.promotion #just so I don't have to change too much below
colnames(dfr)<-c("ID", "Rank") #just so I don't have to change too much below
promotedIDs<-unique(dfr$ID)[sapply(unique(dfr$ID), function(curID){
  hasBoth<-(sum(is.na(match(c("Assoc", "Prof"), dfr$Rank[dfr$ID==curID]))) == 0)
})]
result<-dfr[dfr$ID %in% promotedIDs,]

I check, with match whether both "Prof" and "Assoc" are in the list of Ranks for each ID. Note that match returns NA if a value is not found, so counting the number of NAs is a way of finding out whether both matched.

Upvotes: 1

James
James

Reputation: 66834

You can use xtabs to tabulate your data by ID and Rank:

tab <- xtabs(~ID+Rank,dfr)
tab
   Rank
ID  Assoc Prof
  6     0    2
  7     2    1
  8     2    0
  9     2    1

You want the ones where no zero occurs:

subset(dfr,ID %in% rownames(tab[as.logical(apply(tab,1,prod)),]))
   ID Field  Rank
13  7 Other Assoc
16  7 Other Assoc
17  7 Other  Prof
22  9 Other Assoc
23  9 Other Assoc
24  9 Other  Prof

Upvotes: 4

Related Questions