AcidCatfish
AcidCatfish

Reputation: 324

How to organize based on specific data values

I have this data frame:

structure(list(ID = c(101, 102, 103, 104, 105, 106
), 1Var = c(1, 3, 3, 1, 1, 1), 2Var = c(1, 1, 
1, 1, 1, 1), 3Var = c(3, 1, 1, 1, 1, 1), 4Var = c(1, 
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")

I have been trying to subset based on values of 1 and 0. In this data table there are no 0 values but my full data has it.

I toyed around with this method:

Prime <- grep('$Var', names(Data))
DataPrime <- Data[rowSums(Data[Prime] <= 1),]

I am getting duplicated observations though. Another issue with this method is that it keeps all rows that have a 1 or 0 but not rows with ONLY 1 or 0. So, some rows that have 3 but the rest of the variables are value of 1 that row is still kept in my data.

I think my method will work but I'm not sure what else I need to specify in the argument. I tried a simple subset too but that removed everything from the data:

DataPrime <- subset(Data, '1Var' <=1, '2Var' <=1, '3Var' <=1, '4Var' <=1)

I essentially want my data to look something like this:

 ID             1Var        2Var        3Var      4Var
4 104             1          1           1          1
5 105             1          1           1          1
6 106             1          1           1          1

Upvotes: 1

Views: 70

Answers (2)

akrun
akrun

Reputation: 886978

We can use Reduce with & to create a logical vector for subsetting the rows

subset(Data, Reduce(`&`, lapply(Data[-1], `<=`, 1)))

-output

#   ID 1Var 2Var 3Var 4Var
#4 104    1    1    1    1
#5 105    1    1    1    1
#6 106    1    1    1    1

Or another option is rowSums

subset(Data, !rowSums(Data[-1] > 1))

Upvotes: 3

Allan Cameron
Allan Cameron

Reputation: 173793

I think you're looking for something like:

Prime <-  grep('\\dVar', names(Data))
Data[apply(Data[Prime], 1, function(x) !any(x > 1)),]
#>    ID 1Var 2Var 3Var 4Var
#> 4 104    1    1    1    1
#> 5 105    1    1    1    1
#> 6 106    1    1    1    1

A few things to note are:

  • Your regex inside grep was wrong. The "$" symbol represents the end of a string, not a number. For numbers you can use \\d . Your Prime variable is therefore empty in the example.
  • It's best not to have column names (or any variable name) starting with numbers. These are not legal names in R. You can get round this by surrounding them with backticks, but this is easy to overlook and is a source of bugs.
  • rowSums adds up all the values in each row, so the lowest sum of any of the rows is 4, whereas rowSums(Data[Prime] <= 1) gives the total number of entries that are one or less, giving a vector like c(3, 3, 3, 4, 4, 4). Subsetting Data by this will give 3 copies of row 3 then three copies of row 4, which clearly isn't what you want.
  • In subset, you need the logical conjunction of all your var <= 1 terms, so you should split these with &, not with commas.

Upvotes: 3

Related Questions