Afada
Afada

Reputation: 17

R - Extracting duplicates to a dataframe

I need help with R, similar to question filtering-a-dataframe-showing-only-duplicates I wish to extract duplicates from a dataframe with over 2,000 entries.

The first 15 rows of data looks like this:

run id Diff
1 20 0
1 4 1024
1 4 1
1 4 1
1 4 65
1 4 1
1 4 1
1 11 475
1 11 1
1 11 1
2 25 0
2 18 0
2 18 1
2 18 1
2 18 1

I wish to extract only the duplicates, i.e.

run id Diff
1 4 1024
1 4 1
1 4 1
1 4 65
1 4 1
1 4 1
1 11 475
1 11 1
1 11 1
2 18 0
2 18 1
2 18 1
2 18 1

Using the command

mydata_extract %>% group_by(id) %>% filter(n() > 1) does not extract the data, in fact I get the complete set of data returned. Is there something about "filter(n() > 1)" that I need to change? I'm a beginner with R. Sorry my data table is not formatting correctly, it looks okay in preview!

I will also want to group my data first by "run"

Upvotes: 1

Views: 64

Answers (1)

Julian
Julian

Reputation: 9240

Maybe add run and id in the group_by()?

  library(dplyr)
   df <- tibble::tribble(
      ~"run", ~"id", ~"Diff",
      1, 20, 0,
      1, 4, 1024,
      1, 4, 1,
      1, 4, 1,
      1, 4, 65,
      1, 4, 1,
      1, 4, 1,
      1, 11, 4,
      1, 11, 1,
      1, 11, 1,
      2, 25, 0,
      2, 18, 0,
      2, 18, 1,
      2, 18, 1,
      2, 18, 1
    ) %>% 
     group_by(run, id) %>% 
      filter(n()>1)



   # A tibble: 13 x 3
# Groups:   run, id [3]
     run    id  Diff
   <dbl> <dbl> <dbl>
 1     1     4  1024
 2     1     4     1
 3     1     4     1
 4     1     4    65
 5     1     4     1
 6     1     4     1
 7     1    11     4
 8     1    11     1
 9     1    11     1
10     2    18     0
11     2    18     1
12     2    18     1
13     2    18     1

You can add a mutate, to see how this n() works (counts the number of rows per group),e.g.

df %>% 
 group_by(run, id) %>% 
  mutate(n = n()) 

Upvotes: 1

Related Questions