Select rows from dataframe with unique combination of values from multiple columns

Question

I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team, opponent_team, date, result, team_runs, opponent_runs, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.

For example

team  opponent_team  date           result team_runs opponent_runs
BAL   BOS            2010-04-05      W      5         4

has another row somewhere else that is

team  opponent_team  date           result team_runs opponent_runs
BOS   BAL            2010-04-05      L      4         5

I would like to write some code in dplyr or something similar that selects rows that have a unique combination of the team, opponent_team and date columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.

Thanks

Kan Nishida · Accepted Answer

Have you tried distinct function from dplyr? For your case, it can be something like

library(dplyr)
df %>% distinct(team, opponent_team, date)

Another alternative is to use duplicated function from base R inside filter function of dplyr like below.

filter(!duplicated(team, opponent_team, date)

Select rows from dataframe with unique combination of values from multiple columns

Answers (1)

Related Questions