Reputation: 311
I know there are good tools for creative subsetting, but I'm not familiar with them, so your help is very much appreciated. I went trough similar questions and couldn't find an answer, but please point me to it if you think this is a duplicate.
Lets assume a df
looking like this:
Pop Loc BP
1 1 a 10
2 2 a 10
3 3 a 10
4 4 a 10
5 3 a 50
6 2 c 21
7 1 d 33
8 2 d 8
9 3 d 8
10 4 d 8
I want to identify which Loc
are present in all 4 levels of Pop
but considering Loc
in combination with BP (i.e. in the above example row 5 and row 3 are different). The desired output should look like this:
Pop Loc BP
1 1 a 10
2 2 a 10
3 3 a 10
4 4 a 10
In this example only the first 4 rows of df
meet the condition, as Loc=="a"
and BP=="10"
exist in Pop 1, 2, 3 and 4.
Row 3 should be excluded because the combination Loc=="a"
and BP==50
, is only present in Pop 3, and rows 7-10 do not meet the conditions because Loc=="d"
and BP==8
are not present in Pop 1.
The solution has to bee something general and more or less effective, as in the real dataset length(levels) of Loc
and BP
is around 4,000 (Pop
remains small).
I was thinking to use paste()
to "merge" Loc
and BP
into a new column and then keep only the ones that appear the desired number of times (4 in this example). But I'm sure there is a better way.
Thanks
dput()
to create df
:
> df<-structure(list(Pop = c(1, 2, 3, 4, 3, 2, 1, 2, 3, 4), Loc = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L), .Label = c("a", "c", "d"
), class = "factor"), BP = c(10, 10, 10, 10, 50, 21, 33, 8, 8,
8)), .Names = c("Pop", "Loc", "BP"), row.names = c(NA, -10L), class = "data.frame")
Upvotes: 1
Views: 155
Reputation: 121568
For example using plyr
, you can create a new id (using interaction
) then process your comparisons by this id:
library(plyr)
ddply(transform(df,id =interaction(Loc,BP)),.(id),
function(x)if(all(1:4%in%x$Pop))x)
Pop Loc BP id
1 1 a 10 a.10
2 2 a 10 a.10
3 3 a 10 a.10
4 4 a 10 a.10
Upvotes: 1
Reputation: 2508
A very general solution using base R, where you can specify the grouping columns, column where your required values are, and the actual required values:
subsetCustom <- function(
data,
INDICES,
requiredValueCol,
requiredValues)
{
subsetData <- by(
data = data,
INDICES = INDICES,
FUN = function(subdata, requiredValueCol,
requiredValues) {
if (all(requiredValues %in% subdata[, requiredValueCol]))
out <- subdata
else out <- NULL
return(out)
},
requiredValueCol = requiredValueCol,
requiredValues = requiredValues)
subsetData <- do.call(rbind, subsetData)
return(subsetData)
}
subsetCustom(
data = df,
INDICES = list(df$Loc, df$BP),
requiredValueCol = "Pop",
requiredValues = 1:4)
Upvotes: 1