Mahan
Mahan

Reputation: 109

Delete the duplicate observations if a column contains particular text in R dataframe

I have an R dataframe that has duplicate rows as follows:

X1         X10
rs6908903  chr6
rs6908903  chr6_GL000251v2_alt
rs6908903  chr6_GL000252v2_alt
rs6908903  chr6_GL000252v2_alt
rs6908903  chr6_GL000252v2_alt

In this case, I want to create a new df containing only the first row (Chr6) and delete rows containing the char GL000

Thanks in advance

Upvotes: 2

Views: 130

Answers (2)

cgvoller
cgvoller

Reputation: 879

Just to offer an alternative here is one using dplyr and stringr

library(dplyr)
library(stringr)
df <- data.frame(X1 =c("rs6908903","rs6908903","rs6908903","rs6908903","rs6908903"),X10=c("chr6","chr6_GL000251v2_alt","chr6_GL000252v2_alt","chr6_GL000252v2_alt","chr6_GL000252v2_alt"))

 df %>% filter(!str_detect(X10, 'GL000'))

Output:

         X1  X10
1 rs6908903 chr6

Edit:

df %>% 
  dplyr::filter(!grepl('_', X10))

Output:

         X1  X10
1 rs6908903 chr6

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71610

Try grepl:

df[!grepl('GL000', df$X2),]

Upvotes: 1

Related Questions