Delete rows depends of colums values in another rows R

I have the following data frame with multiple observations:


     CHR            END         START          REF         ALT
      1            1445         1446            G           A
      1            1445         1446            A           G
      3            2787         2787            T           -
      3            2787         2787            -           T

And I want to delete rows if REF column is - and ALT column match REF column of another row while the other columns remains equal.

In my example thats the desired output:

     CHR            END         START          REF         ALT
      1            1445         1446            G           A
      1            1445         1446            A           G
      3            2787         2787            T           -

I'm not sure how to connect index of differentes rows

Always in the data frame the rows to delete follows the "mother" row

Upvotes: 0

Views: 73

Answers (1)

Roman
Roman

Reputation: 17678

you can try

library(tidyverse)
d %>% 
  unite(tmp, REF, ALT, remove = F) %>% 
  mutate(tmp=strsplit(tmp, "_") %>% map_chr(function(x) paste(sort(x), collapse ="_"))) %>% 
  group_by(CHR, END, START, tmp) %>% 
  mutate(n=ifelse(grepl("-", tmp), 1:n(), 1)) %>% 
  filter(n == 1) %>% 
  ungroup() %>% 
  select(-tmp, -n)
# A tibble: 3 x 5
    CHR   END START REF   ALT  
  <int> <int> <int> <fct> <fct>
1     1  1445  1446 G     A    
2     1  1445  1446 A     G    
3     3  2787  2787 T     - 

The idea is to add an identifier tmp with sorted ALT, REF values using a strsplit and map approach. Thus we can filter by duplicates using the counts of similar rows.

The data

d <- read.table(text=" CHR            END         START          REF         ALT

                1            1445         1446            G           A
                1            1445         1446            A           G
                3            2787         2787            T           -
                3            2787         2787            -           T", header=T)

Upvotes: 1

Related Questions