Reputation: 324
I have this data frame:
structure(list(ID = c(101, 102, 103, 104, 105, 106
), 1Var = c(1, 3, 3, 1, 1, 1), 2Var = c(1, 1,
1, 1, 1, 1), 3Var = c(3, 1, 1, 1, 1, 1), 4Var = c(1,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
I have been trying to subset based on values of 1 and 0. In this data table there are no 0 values but my full data has it.
I toyed around with this method:
Prime <- grep('$Var', names(Data))
DataPrime <- Data[rowSums(Data[Prime] <= 1),]
I am getting duplicated observations though. Another issue with this method is that it keeps all rows that have a 1 or 0 but not rows with ONLY 1 or 0. So, some rows that have 3 but the rest of the variables are value of 1 that row is still kept in my data.
I think my method will work but I'm not sure what else I need to specify in the argument. I tried a simple subset too but that removed everything from the data:
DataPrime <- subset(Data, '1Var' <=1, '2Var' <=1, '3Var' <=1, '4Var' <=1)
I essentially want my data to look something like this:
ID 1Var 2Var 3Var 4Var
4 104 1 1 1 1
5 105 1 1 1 1
6 106 1 1 1 1
Upvotes: 1
Views: 70
Reputation: 886978
We can use Reduce
with &
to create a logical vector
for subset
ting the rows
subset(Data, Reduce(`&`, lapply(Data[-1], `<=`, 1)))
-output
# ID 1Var 2Var 3Var 4Var
#4 104 1 1 1 1
#5 105 1 1 1 1
#6 106 1 1 1 1
Or another option is rowSums
subset(Data, !rowSums(Data[-1] > 1))
Upvotes: 3
Reputation: 173793
I think you're looking for something like:
Prime <- grep('\\dVar', names(Data))
Data[apply(Data[Prime], 1, function(x) !any(x > 1)),]
#> ID 1Var 2Var 3Var 4Var
#> 4 104 1 1 1 1
#> 5 105 1 1 1 1
#> 6 106 1 1 1 1
A few things to note are:
grep
was wrong. The "$" symbol represents the end of a string, not a number. For numbers you can use \\d
. Your Prime
variable is therefore empty in the example.rowSums
adds up all the values in each row, so the lowest sum of any of the rows is 4, whereas rowSums(Data[Prime] <= 1)
gives the total number of entries that are one or less, giving a vector like c(3, 3, 3, 4, 4, 4)
. Subsetting Data
by this will give 3 copies of row 3 then three copies of row 4, which clearly isn't what you want.subset
, you need the logical conjunction of all your var <= 1
terms, so you should split these with &
, not with commas.Upvotes: 3