Removing rows in a data frame based on multiple criteria in R

Question

I hope that I have formatted my question correctly as this is my first time posting and fairly new to R.

Below is a small sample of some athlete movement data that I am currently using.

```
      Player   Period      Dist    Date          Type
122 Player_2  Session 4245.9002 31/7/18 Main Training
123 Player_1  Session 4868.2153  2/8/18 Main Training
124 Player_2  Session 4515.1996  2/8/18 Main Training
125 Player_2  Session 3215.8634  7/8/18 Main Training
126 Player_2 Modified  551.8737  7/8/18 Main Training
127 Player_2  Session 4264.7384  9/8/18 Main Training
128 Player_1  Session 4038.1687 16/8/18 Main Training
129 Player_2  Session 4751.6978 16/8/18 Main Training
130 Player_1      RTP 4038.1687 16/8/18 Main Training
131 Player_2 Modified  229.6872 16/8/18 Main Training
132 Player_2 Modified  342.2797 16/8/18 Main Training
133 Player_1  Session 3573.4509 23/8/18 Main Training
134 Player_2  Session 3717.3467 23/8/18 Main Training
reprex()
#> Error: :1:16: unexpected symbol
#> 1:       Player   Period
#>                    ^
```

I would like to remove rows of the data frame based on multiple criteria using dplyr. Specifically, I would like to remove rows containing Session where there is Modified or RTP sharing the same Date. For example, as Player_2 completed Modified training on the 7/8/2018, I would like his Session data removed for that date.

```
      Player   Period      Dist   Date          Type
125 Player_2  Session 3215.8634 7/8/18 Main Training
126 Player_2 Modified  551.8737 7/8/18 Main Training
#> Error: :1:16: unexpected symbol
#> 1:       Player   Period
#>                    ^
```

Likewise for 16/8/2018 where Player_1 and Player_2 completed Modified and RTP training, respectively, on that day.

```
      Player   Period      Dist    Date          Type
128 Player_1  Session 4038.1687 16/8/18 Main Training
129 Player_2  Session 4751.6978 16/8/18 Main Training
130 Player_1      RTP 4038.1687 16/8/18 Main Training
131 Player_2 Modified  229.6872 16/8/18 Main Training
132 Player_2 Modified  342.2797 16/8/18 Main Training
#> Error: :1:16: unexpected symbol
#> 1:       Player   Period
#>                    ^
```

I have filtered data in the past using code such as this.

```
db18 <- db18 %>%
  filter(Period %in% c("Session"))
```

However, I wish to remove athlete Session data containing Modified or RTP so as it doesn't 'contaminate' the analysis I am trying to perform. Wondering how I can do this if it's possible.

Any help will be greatly appreciated. Thanks.

mrjoh3 · Accepted Answer

one approach is to use the group_by() function first then you are applying the filter to the groups. In the code below I have used group_by() and mutate() to create a new column on which to filter. There may be a more elegant solution but this might get you started.

df <- tibble::tribble(
~Player,   ~Period,      ~Dist,    ~Date,       
'Player_2',  'Session', 4245.9002, '31/7/18',
'Player_1',  'Session', 4868.2153, '2/8/18',
'Player_2',  'Session', 4515.1996,  '2/8/18',
'Player_2',  'Session', 3215.8634,  '7/8/18',
'Player_2', 'Modified',  551.8737,  '7/8/18',
'Player_2',  'Session', 4264.7384,  '9/8/18',
'Player_1',  'Session', 4038.1687, '16/8/18',
'Player_2',  'Session', 4751.6978, '16/8/18',
'Player_1',      'RTP', 4038.1687, '16/8/18',
'Player_2', 'Modified',  229.6872, '16/8/18',
'Player_2', 'Modified',  342.2797, '16/8/18',
'Player_1',  'Session', 3573.4509, '23/8/18',
'Player_2',  'Session', 3717.3467, '23/8/18'
)

df %>%
  group_by(Player, Date) %>%
  mutate(filter_col = ifelse(all(c('Session','Modified') %in% Period), 'delete', 'keep'),
         filter_col = ifelse(all(c('Session','RTP') %in% Period), 'delete', filter_col)) %>%
  ungroup() %>%
  filter(filter_col == 'keep')

Removing rows in a data frame based on multiple criteria in R

Answers (2)

Related Questions