Reputation: 41
The data in which I am trying to make my selection looks like
ID Field Rank 8 6 Other Prof 9 6 Other Prof 13 7 Other Assoc 16 7 Other Assoc 17 7 Other Prof 18 8 Other Assoc 19 8 Other Assoc 22 9 Other Assoc 23 9 Other Assoc 24 9 Other Prof
I am trying to create a new variable that contains all the rows of the people(ID) that have been promoted from 'Assoc' to 'Prof'. For example I would like my new variable to look like
ID Field Rank 13 7 Other Assoc 16 7 Other Assoc 17 7 Other Prof 22 9 Other Assoc 23 9 Other Assoc 24 9 Other Prof
I have tried the subset function but with no luck.
Is there a function in R that can do this? If not, how can this be achieved.
EDIT: here, is the result from dput()
. Note I left out the "Field" variable since it does not contain any information in this example.
df.promotion <- structure(list(id = c(6, 6, 7, 7, 7, 8, 8, 9, 9, 9), rank = structure(c(2L,
2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("Assoc", "Prof"
), class = "factor")), .Names = c("id", "rank"), row.names = c(NA,
-10L), class = "data.frame")
Upvotes: 4
Views: 199
Reputation: 55695
Here is the customary one liner using plyr
. The code works by (a) splitting the data frame by id, and (b) choosing only those subsets
with greater than 1 unique rank (which is a proxy for promotion)
require(plyr)
ddply(df.promotion, .(id), subset, length(unique(rank)) > 1)
Upvotes: 1
Reputation: 60756
Here's a fairly easy to understand approach that uses your first inclination to do this using subset()
:
I create p
which is the id
of everyone who's a Prof. Then I create a
which is everyone who's an associate. Then using %in%
we select everyone who's been both an Assoc and a Prof. That gives me a set of keys which I can then use to subset the initial data.frame.
p <- unique(subset(df.promotion, rank=="Prof")$id)
a <- unique(subset(df.promotion, rank=="Assoc")$id)
mySet <- a[a %in% p]
subset(df.promotion, id %in% mySet)
Upvotes: 2
Reputation: 11956
Let's do this with base R (though plyr
is beckoning):
Edit adapted + tested to the newly provided dput
output...
dfr<-df.promotion #just so I don't have to change too much below
colnames(dfr)<-c("ID", "Rank") #just so I don't have to change too much below
promotedIDs<-unique(dfr$ID)[sapply(unique(dfr$ID), function(curID){
hasBoth<-(sum(is.na(match(c("Assoc", "Prof"), dfr$Rank[dfr$ID==curID]))) == 0)
})]
result<-dfr[dfr$ID %in% promotedIDs,]
I check, with match
whether both "Prof" and "Assoc" are in the list of Ranks for each ID. Note that match
returns NA
if a value is not found, so counting the number of NA
s is a way of finding out whether both matched.
Upvotes: 1
Reputation: 66834
You can use xtabs
to tabulate your data by ID
and Rank
:
tab <- xtabs(~ID+Rank,dfr)
tab
Rank
ID Assoc Prof
6 0 2
7 2 1
8 2 0
9 2 1
You want the ones where no zero occurs:
subset(dfr,ID %in% rownames(tab[as.logical(apply(tab,1,prod)),]))
ID Field Rank
13 7 Other Assoc
16 7 Other Assoc
17 7 Other Prof
22 9 Other Assoc
23 9 Other Assoc
24 9 Other Prof
Upvotes: 4