Reputation:
I hope that I have formatted my question correctly as this is my first time posting and fairly new to R.
Below is a small sample of some athlete movement data that I am currently using.
```
Player Period Dist Date Type
122 Player_2 Session 4245.9002 31/7/18 Main Training
123 Player_1 Session 4868.2153 2/8/18 Main Training
124 Player_2 Session 4515.1996 2/8/18 Main Training
125 Player_2 Session 3215.8634 7/8/18 Main Training
126 Player_2 Modified 551.8737 7/8/18 Main Training
127 Player_2 Session 4264.7384 9/8/18 Main Training
128 Player_1 Session 4038.1687 16/8/18 Main Training
129 Player_2 Session 4751.6978 16/8/18 Main Training
130 Player_1 RTP 4038.1687 16/8/18 Main Training
131 Player_2 Modified 229.6872 16/8/18 Main Training
132 Player_2 Modified 342.2797 16/8/18 Main Training
133 Player_1 Session 3573.4509 23/8/18 Main Training
134 Player_2 Session 3717.3467 23/8/18 Main Training
reprex()
#> Error: <text>:1:16: unexpected symbol
#> 1: Player Period
#> ^
```
I would like to remove rows of the data frame based on multiple criteria using dplyr
. Specifically, I would like to remove rows containing Session
where there is Modified
or RTP
sharing the same Date
. For example, as Player_2
completed Modified
training on the 7/8/2018
, I would like his Session
data removed for that date.
```
Player Period Dist Date Type
125 Player_2 Session 3215.8634 7/8/18 Main Training
126 Player_2 Modified 551.8737 7/8/18 Main Training
#> Error: <text>:1:16: unexpected symbol
#> 1: Player Period
#> ^
```
Likewise for 16/8/2018
where Player_1
and Player_2
completed Modified
and RTP
training, respectively, on that day.
```
Player Period Dist Date Type
128 Player_1 Session 4038.1687 16/8/18 Main Training
129 Player_2 Session 4751.6978 16/8/18 Main Training
130 Player_1 RTP 4038.1687 16/8/18 Main Training
131 Player_2 Modified 229.6872 16/8/18 Main Training
132 Player_2 Modified 342.2797 16/8/18 Main Training
#> Error: <text>:1:16: unexpected symbol
#> 1: Player Period
#> ^
```
I have filtered data in the past using code such as this.
```
db18 <- db18 %>%
filter(Period %in% c("Session"))
```
However, I wish to remove athlete Session
data containing Modified
or RTP
so as it doesn't 'contaminate' the analysis I am trying to perform. Wondering how I can do this if it's possible.
Any help will be greatly appreciated. Thanks.
Upvotes: 2
Views: 2162
Reputation: 1233
I hope this helps you.
player <- read.csv("player.csv")
player
player
Id Player Period Dist Date Type
1 122 Player_2 Session 4245.9002 31/07/18 Main Training
2 123 Player_1 Session 4868.2153 02/08/18 Main Training
3 124 Player_2 Session 4515.1996 02/08/18 Main Training
4 125 Player_2 Session 3215.8634 07/08/18 Main Training
5 126 Player_2 Modified 551.8737 07/08/18 Main Training
6 127 Player_2 Session 4264.7384 09/08/18 Main Training
7 128 Player_1 Session 4038.1687 16/08/18 Main Training
8 129 Player_2 Session 4751.6978 16/08/18 Main Training
9 130 Player_1 RTP 4038.1687 16/08/18 Main Training
10 131 Player_2 Modified 229.6872 16/08/18 Main Training
11 132 Player_2 Modified 342.2797 16/08/18 Main Training
12 133 Player_1 Session 3573.4509 23/08/18 Main Training
13 134 Player_2 Session 3717.3467 23/08/18 Main Training
Grouping by Player
and Date
Columns. Then extracting the Id's if a particular Date
is having
Modified(or)RTP
along with Session
.
library(dplyr)
removable <- player %>% group_by_(.dots = c("Player", "Date")) %>%
filter( (sum(Period == 'Session') >= 1) & ((sum(Period == 'Modified') != 0) | (sum(Period == 'RTP') != 0 ))) %>%
filter(Period == 'Session')
Now dropping the rows from player
dataframe if any player$Id
matches with the removable$Id
player <- player[!(player$Id %in% removable$Id), ]
player
Id Player Period Dist Date Type
1 122 Player_2 Session 4245.9002 31/07/18 Main Training
2 123 Player_1 Session 4868.2153 02/08/18 Main Training
3 124 Player_2 Session 4515.1996 02/08/18 Main Training
5 126 Player_2 Modified 551.8737 07/08/18 Main Training
6 127 Player_2 Session 4264.7384 09/08/18 Main Training
9 130 Player_1 RTP 4038.1687 16/08/18 Main Training
10 131 Player_2 Modified 229.6872 16/08/18 Main Training
11 132 Player_2 Modified 342.2797 16/08/18 Main Training
12 133 Player_1 Session 3573.4509 23/08/18 Main Training
13 134 Player_2 Session 3717.3467 23/08/18 Main Training
Upvotes: 2
Reputation: 457
one approach is to use the group_by()
function first then you are applying the filter to the groups. In the code below I have used group_by()
and mutate()
to create a new column on which to filter. There may be a more elegant solution but this might get you started.
df <- tibble::tribble(
~Player, ~Period, ~Dist, ~Date,
'Player_2', 'Session', 4245.9002, '31/7/18',
'Player_1', 'Session', 4868.2153, '2/8/18',
'Player_2', 'Session', 4515.1996, '2/8/18',
'Player_2', 'Session', 3215.8634, '7/8/18',
'Player_2', 'Modified', 551.8737, '7/8/18',
'Player_2', 'Session', 4264.7384, '9/8/18',
'Player_1', 'Session', 4038.1687, '16/8/18',
'Player_2', 'Session', 4751.6978, '16/8/18',
'Player_1', 'RTP', 4038.1687, '16/8/18',
'Player_2', 'Modified', 229.6872, '16/8/18',
'Player_2', 'Modified', 342.2797, '16/8/18',
'Player_1', 'Session', 3573.4509, '23/8/18',
'Player_2', 'Session', 3717.3467, '23/8/18'
)
df %>%
group_by(Player, Date) %>%
mutate(filter_col = ifelse(all(c('Session','Modified') %in% Period), 'delete', 'keep'),
filter_col = ifelse(all(c('Session','RTP') %in% Period), 'delete', filter_col)) %>%
ungroup() %>%
filter(filter_col == 'keep')
Upvotes: 1