Saltaf
Saltaf

Reputation: 65

Comparing each element in two columns and set another column

I have a data frame (after fread from a file) with two columns (dep and label). I want to set another column (mark) with id value depending on the match. If the 'dep' entry matches 'lablel' entry, mark get the 'id' of the matched 'label'. For no match, mark get the value of its own 'id'. Currently, I have work around solution with loops but I know there should be a neat way to do it in R specifics.

trace <- data.table(id=seq(1:7),dep=c(-1,45,40,47,0,45,43), 
label=c(99,40,43,45,47,42,48), mark=rep("",7))
   id dep label mark
1:  1  -1    99  1    
2:  2  45    40  2  
3:  3  40    43  2   
4:  4  47    45  4  
5:  5  0     47  5   
6:  6  45    42  4  
7:  7  43    48  3   

I know loops are slow in r and just to give example the following naive for/while works for small sizes but my data set is huge.

trace$mark <- trace$id
for (i in 1:length(trace$id)){
val <- trace$dep[i]
j <- 1
while(j<=i && val !=-1 && val!=0){ // don't compare if val is -1/0

    if(val==trace$label[j]){
            trace$mark[i] <- trace$id[j]
                    }
 j <-j +1
}
}

I have also tried using the following approach but it works only if there is a single match.

match <- which(trace$dep %in% trace$label)
match_to <- which(trace$label %in% trace$dep)
trace$mark[match] <- trace$mark[match_to] 

Upvotes: 0

Views: 212

Answers (1)

A Gore
A Gore

Reputation: 1910

This solution might help:

trace[trace[,.(id,dep=label)],mark:=as.character(i.id),on="dep"]
trace[mark=="",mark:=as.character(id)]
# id dep label mark
# 1:  1  -1    99    1
# 2:  2  45    40    4
# 3:  3  -1    43    3
# 4:  4  47    45    5
# 5:  5  -1    47    5
# 6:  6  45    42    4
# 7:  7  43    48    3

Update:

To make sure you are not matching dep with 0 or -1 values you can just add another line.

trace[dep %in% c(0,-1), mark:= as.character(id)]

OR Try this:

trace[trace[!dep %in% c(0,-1),.(id,dep=label)],mark:=as.character(i.id),on="dep"]
trace[mark=="",mark:=as.character(id)]

The solution that worked

trace[trace[,.(id,dep=label)],on=.(id<=id,dep),mark:=as.char‌​acter(i.id),allow.ca‌​rtesian=TRUE]

Upvotes: 1

Related Questions