how to remove duplicates from a data frame in R

Question

I have a data frame of correlation coefficients like the following. In the data frame it has correlation coefficients of a*b and b*a which are the same. How do I remove this duplicates? Can anyone please help

**Var1, Var2, r**
ApoA1.ng.ml.1, Apo.B.ng.ml, 0.9998438
Apo.B.ng.ml, ApoA1.ng.ml.1, 0.9998438
SLM.T0., TBW.T0., 0.9992563
TBW.T0., SLM.T0., 0.9992563
Insulin.mercdiaConc..U.L, Insulin..pg.ml, 0.9313702
Insulin..pg.ml, Insulin.mercdiaConc..U.L, 0.9313702

Tim Biegeleisen · Accepted Answer

We could try using the sqldf package here:

library(sqldf)
sql <- "SELECT MIN(Var1, Var2), MAX(Var2, Var1), MAX(r) AS R
        FROM df
        GROUP BY MIN(Var1, Var2), MAX(Var2, Var1)"

df_out <- sqldf(sql)

how to remove duplicates from a data frame in R

Answers (2)

Demo

Related Questions