useR
useR

Reputation: 3082

Output column index based on each row in data.frame

I have a data.frame

Orig <- c("HKG", "PEK", "PVG", "AMS")
stop2 <- c("", "HKG", "PEK", "HKG")
stop3 <- c("", "", "HKG", "")
Dest <- "X"
(data <- data.frame(Orig, stop2, stop3, Dest))

  Orig stop2 stop3 Dest
1  HKG                X
2  PEK   HKG          X
3  PVG   PEK   HKG    X
4  AMS   HKG          X

For each row, I would like to output the column index where HKG appears. For example for the second row, "HKG" is at stop2, which is the 2nd column. So, I would like the output to be 2.

The desired output is like this:

  Orig stop2 stop3 Dest output
1  HKG                X      1
2  PEK   HKG          X      2
3  PVG   PEK   HKG    X      3
4  AMS   HKG          X      2

My initial idea was using which(=="HKG"), but I only know how to do for colnames.

Upvotes: 1

Views: 154

Answers (3)

thelatemail
thelatemail

Reputation: 93803

apply across each row:

dat$output <- apply(dat[,-4],1,function(x) which(x=="HKG") )

Or if speed matters, try the following, which will be about 20x faster.

intm <- dat[-4]=="HKG"
dat$output <- col(intm)[intm][order(row(intm)[intm])]

Or even simpler:

max.col(dat[-4]=="HKG")

All resulting in:

#  Orig stop2 stop3 Dest output
#1  HKG                X      1
#2  PEK   HKG          X      2
#3  PVG   PEK   HKG    X      3
#4  AMS   HKG          X      2

Upvotes: 2

akrun
akrun

Reputation: 886938

indx <- (t(dat)=="HKG")*(seq_len(nrow(dat)))
indx[!!indx]
#[1] 1 2 3 2

Upvotes: 2

Jota
Jota

Reputation: 17611

You can use which along with t, though @thelatemail 's answer is more intuitive:

dat$output <- which(t(dat) == "HKG", arr.ind=TRUE)[,1]

# This next line does the same thing, and is perhaps more clear than using [,1]:
# dat$output <- which(t(dat) == "HKG", arr.ind=TRUE)[,"row"]

dat

#  Orig stop2 stop3 Dest output
#1  HKG                X      1
#2  PEK   HKG          X      2
#3  PVG   PEK   HKG    X      3
#4  AMS   HKG          X      2

Upvotes: 3

Related Questions