Reputation: 19
I've got 2 files:
eg
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 0.066 0.71125 1.77 0.5045 0.7417104 1.584007 0.872757 1.729945 4
2 0.500 6.07500 20.30 1.7500 9.5017100 17.255490 11.180490 6.388851 4
3 0.670 0.67000 0.67 0.6700 0.0000000 0.670000 0.670000 0.000000 1
and
kl
I II III IV
1 0.80 0.60 0.40 0.20
2 0.75 0.55 0.35 0.15
3 65.60 50.70 38.80 24.00
I'd like to compare all of V2 rows from "eg" with all adequatly rows from "kl", 'cuz I need (as a result) evaluate which row got which class (I, II, III, IV or V for the rest) in next column. E.g.
if eg[1,2] >= kl[1,1] --> I
if eg[1,2] >= kl[2,1] --> II
if eg[1,2] >= kl[3,1] --> III
if eg[1,2] >= kl[4,1] --> IV
else --> V
and the same for eg[2,2] and eg[3,2].
I repaired my first loop and an iteration like this, but (of course) it doesn't work...:
eg <- read.csv("eg.csv", header=F, sep=";")
eg <- eg[, -c(1,3,4,5,6,7,8,9)]
eg <- t(eg)
as.numeric(as.character(eg))
for (i in eg) {
if (is.na(eg[i,1]) || eg[i,1] == "NA") {
cat(("0"), sep=";")
} else if (eg[i,1] >= kl[i,1]) {
cat(("1"), sep=";")
} else if (eg[i,1] >= kl[i,2]) {
cat(("2"), sep=";")
} else if (eg[i,1] >= kl[i,3]) {
cat(("3"), sep=";")
} else if (eg[i,1] >= kl[i,4]) {
cat(("4"), sep=";")
} else {
cat(("5"), sep=";")
next}
}
R returns good values only for two first line, then it writes:
00Error in if (is.na(eg[i, 1]) || eg[i, 1] == "NA") { :
missing value where TRUE/FALSE needed
But when I make the same for each row separately - it worked. NOW NOT :(
Please, help me! And thank you!
Upvotes: 0
Views: 129
Reputation: 263451
This is a much more compact method:
c( "I", "II", "III", "IV") [ findInterval(eg[ , 1] , rev(kl[1, ] )) ]
This uses the findInterval
function to determine which category (as a number) each eg
value is in. (The findInterval
function requires these to be non-decreasing, so in this problem we need to rev()
-erse the boundary values. That is then used to select from a character vector.
If you post an example that makes sense (rather than one with a varying number of items per row) we can demonstrate how to put in a loop or an sapply
call.
Step-by-step, which for R means from the innermost to outermost:
1) reverse the ordering of the
kl
vector so that it can be used as the breaks infindInterval
2) The run the
eg[,1]
column throughfindInterval(.,.)
returning an index to be used with "["3) Extract the appropriate items from the category character vector (based on the break interval that the eg[,1] value fell into.
The code above was posted when there wasn't a proper example. Often it is needed to put -Inf
and Inf
on either side of the breakpoints so that you don't get zero indices from findInterval
. This codes seems to work on the dataframes offered now:
apply(kl,1,function(x) c( "V", "I", "II", "III", "IV") [
findInterval(eg[ , 1] , c( -Inf, rev(x ),Inf) ) ] )
1 2 3
[1,] "V" "V" "V"
[2,] "II" "II" "V"
[3,] "III" "III" "V"
probably will want to transpose this because of the way the
apply
function returns row processing in column-order.
(I think the problem specification is still messed up, since it refers to a row 4 of a 3 row object (that also has 4 columns. The numeric values only make sense when used as rows.)
Upvotes: 1