pani_lebrun
pani_lebrun

Reputation: 19

R: comparing with few values

I've got 2 files:

eg
V1      V2    V3     V4        V5        V6        V7       V8 V9
1 0.066 0.71125  1.77 0.5045 0.7417104  1.584007  0.872757 1.729945  4
2 0.500 6.07500 20.30 1.7500 9.5017100 17.255490 11.180490 6.388851  4
3 0.670 0.67000  0.67 0.6700 0.0000000  0.670000  0.670000 0.000000  1

and

kl
 I    II   III    IV     
1  0.80  0.60  0.40  0.20
2  0.75  0.55  0.35  0.15 
3 65.60 50.70 38.80 24.00

I'd like to compare all of V2 rows from "eg" with all adequatly rows from "kl", 'cuz I need (as a result) evaluate which row got which class (I, II, III, IV or V for the rest) in next column. E.g.

if eg[1,2] >= kl[1,1] --> I
if eg[1,2] >= kl[2,1] --> II
if eg[1,2] >= kl[3,1] --> III
if eg[1,2] >= kl[4,1] --> IV
else --> V

and the same for eg[2,2] and eg[3,2].

I repaired my first loop and an iteration like this, but (of course) it doesn't work...:

eg <- read.csv("eg.csv", header=F, sep=";")
eg <- eg[, -c(1,3,4,5,6,7,8,9)]
eg <- t(eg)
as.numeric(as.character(eg))

for (i in eg) {
if (is.na(eg[i,1]) || eg[i,1] == "NA") {
cat(("0"), sep=";")
} else if (eg[i,1] >= kl[i,1]) {
cat(("1"), sep=";")
} else if (eg[i,1] >= kl[i,2]) {
cat(("2"), sep=";")
} else if (eg[i,1] >= kl[i,3]) {
cat(("3"), sep=";")
} else if (eg[i,1] >= kl[i,4]) {
cat(("4"), sep=";") 
} else {
cat(("5"), sep=";")
next}
}

R returns good values only for two first line, then it writes:

 00Error in if (is.na(eg[i, 1]) || eg[i, 1] == "NA") { : 
      missing value where TRUE/FALSE needed

But when I make the same for each row separately - it worked. NOW NOT :(

Please, help me! And thank you!

Upvotes: 0

Views: 129

Answers (1)

IRTFM
IRTFM

Reputation: 263451

This is a much more compact method:

c( "I", "II", "III", "IV") [ findInterval(eg[ , 1] , rev(kl[1, ] )) ] 

This uses the findInterval function to determine which category (as a number) each eg value is in. (The findInterval function requires these to be non-decreasing, so in this problem we need to rev()-erse the boundary values. That is then used to select from a character vector.

If you post an example that makes sense (rather than one with a varying number of items per row) we can demonstrate how to put in a loop or an sapply call.

Step-by-step, which for R means from the innermost to outermost:

1) reverse the ordering of the kl vector so that it can be used as the breaks in findInterval

2) The run the eg[,1] column through findInterval(.,.) returning an index to be used with "["

3) Extract the appropriate items from the category character vector (based on the break interval that the eg[,1] value fell into.

The code above was posted when there wasn't a proper example. Often it is needed to put -Inf and Inf on either side of the breakpoints so that you don't get zero indices from findInterval. This codes seems to work on the dataframes offered now:

apply(kl,1,function(x) c( "V", "I", "II", "III", "IV") [ 
                                findInterval(eg[ , 1] , c( -Inf, rev(x ),Inf) ) ]  )
     1     2     3  
[1,] "V"   "V"   "V"
[2,] "II"  "II"  "V"
[3,] "III" "III" "V"

probably will want to transpose this because of the way the apply function returns row processing in column-order.

(I think the problem specification is still messed up, since it refers to a row 4 of a 3 row object (that also has 4 columns. The numeric values only make sense when used as rows.)

Upvotes: 1

Related Questions