Reputation: 1960
I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team
, opponent_team
, date
, result
, team_runs
, opponent_runs
, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.
For example
team opponent_team date result team_runs opponent_runs
BAL BOS 2010-04-05 W 5 4
has another row somewhere else that is
team opponent_team date result team_runs opponent_runs
BOS BAL 2010-04-05 L 4 5
I would like to write some code in dplyr
or something similar that selects rows that have a unique combination of the team
, opponent_team
and date
columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.
Thanks
Upvotes: 3
Views: 8316
Reputation: 106
Have you tried distinct
function from dplyr? For your case, it can be something like
library(dplyr)
df %>% distinct(team, opponent_team, date)
Another alternative is to use duplicated
function from base R inside filter
function of dplyr like below.
filter(!duplicated(team, opponent_team, date)
Upvotes: 6