Reputation: 488
I want to remove from my dataframe rows that have present in two particular columns the same string value, I know that it is possible to remove a row if it has a particular string in the row with:
abs_pres_matrix[!grepl("BGC", abs_pres_matrix$Genome),]
In this case I have a dataframe such as:
GC1 | GC2 | Distance
BGC123 BGC23 0.5
BGC123 MBT_13 0.6
BGC134 MBT_13 0.5
BGC123 BGC 134 0.6
Desired Output:
GC1 | GC2 | Distance
BGC123 MBT_13 0.6
BGC134 MBT_13 0.5
Hence, I want to remove the columns that both contain the string "BGC"
Upvotes: 0
Views: 71
Reputation: 21442
This solution uses three methods: (i) the rows are pasted into strings using apply
and paste0
; (ii) the strings are searched for the repeated occurrence of the pattern BGC
using regex including backreference (\\1
); (iii) those rows that satisfy this condition are removed from the dataframe using -which
(or, alternatively, just !
):
df[-which(grepl("(BGC).*\\1", apply(df, 1, paste0, collapse = " "))),]
GC1 GC2 Distance
2 BGC123 MBT_13 0.6
3 BGC134 MBT_13 0.5
Upvotes: 1
Reputation: 804
your data.frame:
df <- data.frame(
GC1 = c("BGC123","BGC123","BGC134","BGC123"),
GC2 = c("BGC123","MBT_13","MBT_13","BGC123"),
Distance = c(0.5, 0.6, 0.5, 0.6),
stringsAsFactors = F
)
if you just want to delete the rows with "BGC", just go for grepl:
df[!grepl("BGC", df$GC2) , ]
#or
subset(df, !grepl("BGC", df$GC2))
if you want to eliminate the rows where GC1 is exactly like GC2 you can use subset with apply:
subset(df, apply(df, 1, function(x) x[1] %in% x[2]) )
Upvotes: 2
Reputation: 19394
library(dplyr)
df %>%
filter_at(vars(starts_with("GC")), all_vars(grepl("BGC", .)))
Upvotes: 0
Reputation: 73602
Using grep
.
abs_pres_matrix[!lengths(apply(abs_pres_matrix[, 1:2], 1, grep, pattern="BGC")) > 1,]
# GC1 GC2 Distance
# 2 BGC123 MBT_13 0.6
# 3 BGC134 MBT_13 0.5
Upvotes: 1