Reputation: 35
I was wondering if anyone could help me with a problem I'm having in R. It involves looping over columns and rows. The example below should be clear hopefully. I have a 5x5 table below. Using row 1 as an example, I would like to count the number of times V2:V5 are lower than the value in V1, and express that as a decimal.
set.seed(1)
data=as.data.frame(replicate(5, rnorm(5)))
V1 V2 V3 V4 V5
1 -0.6264538 -0.8204684 1.5117812 -0.04493361 0.91897737
2 0.1836433 0.4874291 0.3898432 -0.01619026 0.78213630
3 -0.8356286 0.7383247 -0.6212406 0.94383621 0.07456498
4 1.5952808 0.5757814 -2.2146999 0.82122120 -1.98935170
5 0.3295078 -0.3053884 1.1249309 0.59390132 0.61982575
test=lapply(2:5,function(a){
ifelse(data[1,1]<=data[1,a],1,0)})
testtable=(as.data.frame(table(unlist(test)))[1,2])/4
testtable
[1] 0.25
This means that in row 1, only 1/4 values in V2:V5 are lower than V1. I'd like to use an additional loop for this to go through each row separately. I tried:
test2=lapply(2:5,function(a){
lapply(1:5,function(b){
ifelse(original_permuted_results[b,1]<=original_permuted_results[a,b],1,0)
(as.data.frame(table(unlist(test)))[1,2])/4})})
Resulting in
[[1]]
[[1]][[1]]
[1] 0.25
[[1]][[2]]
[1] 0.25
[[1]][[3]]
[1] 0.25
[[1]][[4]]
[1] 0.25
[[1]][[5]]
[1] 0.25
[[2]]
[[2]][[1]]
[1] 0.25
And continues like that, just printing out 0.25 as the result for the remainder of the loops. It should produce, ignoring the words in brackets:
(for row 1) 0.25
(for row 2) 0.25
(for row 3) 0
(for row 4) 1
(for row 5) 0.25
I had a trawl through the archives but couldn't find anything. My actual data has 300+ rows and 10000 columns, but the output I'm trying to achieve is exactly the same. If anyone has any suggestions that would be very must appreciated. Thanks.
Upvotes: 0
Views: 806
Reputation: 3711
does this work,
vec<-rowSums(data<data$V1)/4
> vec
[1] 0.25 0.25 0.00 1.00 0.25
Upvotes: 2
Reputation: 18323
Very similar to @BrodieG, but perhaps a little clearer:
# Find when each column is less than the first column.
lower.than.first<-sapply(data[2:5],function(x) x<data[,1])
# Calculate the proportion
num.true<-rowSums(lower.than.first) # TRUE is 1, and FALSE is 0, when summing.
# Get the proportion.
props<-num.true/ncol(lower.than.first)
# [1] 0.25 0.25 0.00 1.00 0.25
Upvotes: 0
Reputation: 52687
You don't need loops. You can take advantage of vectorization:
cat(paste("(for row", 1:nrow(df), ")",
rowSums(df[, 1] > df[, 2:5]) / 4), # this is where it all happens
sep="\n"
)
Produces:
(for row 1 ) 0.25
(for row 2 ) 0.25
(for row 3 ) 0
(for row 4 ) 1
(for row 5 ) 0.25
Here we take advantage of >
coercing the RHS to a matrix in order to do the comparison.
Upvotes: 3