asathe1
asathe1

Reputation: 31

Removing a row from a Dataframe if text in column 1 equals text in column 2 (in r)

I am trying to create unique combinations of all the tickers. I have created a dataframe with all the combinations. However I want to remove all those that are the same. So if the ticker in row 1 column 1 equals the text in row 1 column 2 then I want to either make this NA or remove the row. Therefore you will be left with all the unique combinations.

q <- c("BATS LN EQUITY","DGE LN EQUITY","IMB LN EQUITY","RDSB LN EQUITY")
    p <- c("GBPUSD CURNCY","GOLDS INDEX","DXY CURNCY")
    o <- expand.grid(q=q, p=p)
    o[order(o$q),]
    o <- data.frame(o)
    o$q <- as.character(o$q)
    o$p <- as.character(o$p)
    o <- data.frame(o)



    for(i in 1:nrow(o)){
    if(o[i,1] = o[i,2]){
     o[i,2] = NA 
    }  
     }

Upvotes: 0

Views: 107

Answers (2)

Benloper
Benloper

Reputation: 458

I'm more Python so the pythonic way would be to use duplicate function in pandas, but for r I would think the unique() function would be better:

unique(o)

Also possible to use Duplicated() function:

df[duplicated(o), ]

Upvotes: 0

Taylor H
Taylor H

Reputation: 436

Think of it instead as keeping the rows where the two columns are not equal. Try: o[o$q != o$p,].

Your solution can work too, but you need to using == instead of = in your if. Like so:

for(i in 1:nrow(o)){
  if(o[i,1] == o[i,2]){
    o[i,2] = NA 
  }  
}

This just is slower and not as idiomatic than the first way I mention. And they have different output, but both are in the set of options you say you want.

Upvotes: 1

Related Questions