PrincessJellyfish
PrincessJellyfish

Reputation: 149

Replace values in data frame based upon a condition in R

I have a dataframe (df) 172*92 and a vector (limit). The df is constructed as :

          Sample1 Sample2 Sample3   ...  Sample92
Person 1   5.8      1.2     3.3     ...     ...
Person 2   5.2      3.4     6.2     ...    
Person 3   8.3      5.0     6.3     ...
    .
Person 172 ....

And the vector limit has 92 elements (5.3 , 4.8 , 6.1, ...)

I now want to replace the values in my df with either 1 or 0 depending if the value is larger than the element in limit corresponding to the column. Meaning that all elements in the first column larger than 5.3 should be replaced with 1 and the others 0. The second column should compare to the value 4.8 and so on.

So my df above would look like:

          Sample1 Sample2 Sample3   ...  Sample92
Person 1   1        0       0       ...     ...
Person 2   0        0       1       ...    
Person 3   1        1       1       ...
    .
Person 172 ....

I tried to write the code but as you probably see it doesn't work.

dfcopy<-df
for (i in 1:92){
  dfcopy[i]<-if(dfcopy[,i]>=limit[i]) 
 {1}
  else{0}  
}

Upvotes: 1

Views: 780

Answers (2)

Cagg
Cagg

Reputation: 159

You can create a using the matrix using your vector something like this :

mat <- matrix(rep(vec, 172), ncol= 92, byrow = T)

then you can just compare your data frame and vector using ifelse and giving the output in the results:

result_df <- ifelse(df > vec, 1, 0)

Upvotes: 0

akrun
akrun

Reputation: 886938

You can use

 +(df > limit[col(df)])
 #           Sample1 Sample2 Sample3
 #Person 1       1       0       0
 #Person 2       0       0       1
 #Person 3       1       1       1

The above works as we are comparing elements of equal length. In this example, there are 3 columns for 'df' and 3 elements in 'limit'. By replicating the 'limit', we compare element by element. Here col(df) gives the numeric index of the column of 'df'.

 col(df)
 #     [,1] [,2] [,3]
 #[1,]    1    2    3
 #[2,]    1    2    3
 #[3,]    1    2    3

Based on the order of elements in 'limit', the first observation is replicated 3 times, 2nd again 3 times, and so on...

Then we create a logical matrix by doing >. The TRUE/FALSE can be coerced to 'binary' form by either adding (+0L) or multiplying (*1L) or another compact option would be (+().

data

 df <- structure(list(Sample1 = c(5.8, 5.2, 8.3), Sample2 = c(1.2, 3.4, 
 5), Sample3 = c(3.3, 6.2, 6.3)), .Names = c("Sample1", "Sample2", 
 "Sample3"), class = "data.frame", row.names = c("Person 1", "Person 2", 
 "Person 3"))
limit <-  c(5.3, 4.8, 6.1)

Upvotes: 3

Related Questions