Reputation: 839
I have a data.frame which looks like this :
20021 K08975 K09735 0.929
20022 K08979 K09735 0.934
20023 K09140 K09735 0.901
20024 K09142 K09735 0.938
20025 K09152 K09735 0.947
20026 K09482 K09735 0.919
20027 K09716 K09735 0.944
20028 K09723 K09735 0.949
20029 K09726 K09735 0.915
20030 K06875 K09736 0.905
20031 K09149 K09736 0.901
20032 K09721 K09736 0.903
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
20035 K00952 K09738 0.904
20036 K01622 K09738 0.907
20037 K06875 K09738 0.912
20038 K06963 K09738 0.923
20039 K07060 K09738 0.934
There are three columns : var1
, var2
& corr
var1
& var2
can take the values "KOXXXX" or "OTUXXXX" .
I would like to keep the rows where var1
and var2
are different, I mean only the rows where appears KOXXXX OTUXXXX
or OTUXXXX KOXXXX
Upvotes: 1
Views: 65
Reputation: 886968
We can also do this in base R
as
df[Reduce(`!=`, lapply(df[1:2], substr, 1, 2)),]
# var1 var2 corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906
Upvotes: 1
Reputation: 388817
Probably, something like
subset(df, grepl("^K0", var1) & grepl("^OTU", var2) |
grepl("^OTU", var1) & grepl("^K0", var2))
# var1 var2 corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906
Or using startsWith
subset(df, startsWith(var1, "K0") & startsWith(var2, "OTU") |
startsWith(var1, "OTU") & startsWith(var2, "K0"))
Or using dplyr
we can use grepl
/str_detect
with filter
library(dplyr)
library(stringr)
df %>%
filter(str_detect(var1, "^K0") & str_detect(var2, "^OTU") |
str_detect(var1, "^OTU") & str_detect(var2, "^K0"))
data
df <- structure(list(var1 = c("K08975", "K08979", "K09140", "K09142",
"K09152", "K09482", "K09716", "K09723", "K09726", "K06875", "K09149",
"K09721", "OTU0001", "OTU0095", "K00952", "K01622", "K06875",
"K06963", "K07060"), var2 = c("K09735", "K09735", "K09735", "K09735",
"K09735", "K09735", "K09735", "K09735", "K09735", "K09736", "K09736",
"K09736", "K09738", "K09738", "K09738", "K09738", "K09738", "K09738",
"K09738"), corr = c(0.929, 0.934, 0.901, 0.938, 0.947, 0.919,
0.944, 0.949, 0.915, 0.905, 0.901, 0.903, 0.908, 0.906, 0.904,
0.907, 0.912, 0.923, 0.934)), row.names = 20021:20039, class =
"data.frame")
Upvotes: 1
Reputation: 9485
Maybe this is naive, but could help:
# here you take only the rows where the first two character of var1 and var2
# are different
df[substr(df$var1,1,2) != substr(df$var2,1,2),]
var1 var2 corr
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
Upvotes: 2