Reputation: 984
I want to filter a dataframe using dplyr using a conditional. The condition I want to test is whether a country-year combination has two versions.
df <- data.frame(country = c("country1", "country2", "country1", "country2", "country3"), year = rep(2011,5), version = c("versionA", "versionA", "versionB", "versionB", "versionB"))
Here is what I tried after looking here:
df %>%
group_by(country, year) %>%
{if unique(version)==1 . else filter(version == "versionA")}
What I am hoping to get is a dataframe that looks like this:
country year version
country1 2011 versionA
country2 2011 versionA
country3 2011 versionB
Upvotes: 2
Views: 436
Reputation: 5798
Base R one-liner thanks (@akrun):
df[!(duplicated(df[1:2])),]
Base R one-liner:
df[!(duplicated(df$country, df$year)),]
Tidyverse solution:
library(tidyverse)
df %>%
filter(!(duplicated(country, year)))
A more generic base R solution:
# Create a counter of versions for each year and country:
df$tmp <- with(lapply(df, function(x){if(is.factor(x)){as.character(x)}else{x}}),
ave(version, paste0(country, year), FUN = seq.int))
# Subset the dataframe to hold only the first record for each year/country:
df[which(df$tmp == 1), ]
A more generic tidyverse solution:
df %>%
arrange(version) %>%
filter(!(duplicated(country, year)))
Upvotes: 1
Reputation: 887901
After grouping by 'country', 'year', filter
if number of distinct elements is greater than 1 return the 'versionA', or else return the first element
library(dplyr)
df %>%
group_by(country, year) %>%
filter((n_distinct(version) > 1 & version == 'versionA')|row_number() == 1)
# A tibble: 3 x 3
# Groups: country, year [3]
# country year version
# <fct> <dbl> <fct>
#1 country1 2011 versionA
#2 country2 2011 versionA
#3 country3 2011 versionB
Or this can be added in a if/else
df %>%
group_by(country, year) %>%
filter(if(n_distinct(version) > 1) version == 'versionA'
else row_number() ==1)
# A tibble: 3 x 3
# Groups: country, year [3]
# country year version
# <fct> <dbl> <fct>
#1 country1 2011 versionA
#2 country2 2011 versionA
#3 country3 2011 versionB
Or another option is arrange
df %>%
arrange(country, year, version != 'versionA') %>%
group_by(country, year) %>%
slice(1)
Or with summarize
df %>%
group_by(country, year) %>%
summarise(version = if(n_distinct(version) > 1) 'versionA' else first(version))
Or using data.table
library(data.table)
setDT(df)[, .SD[if(n_distinct(version) > 1) version == 'versionA'
else 1], .(country, year)]
Upvotes: 1
Reputation: 389275
To count the number of unique values we can use n_distinct
and filter the rows based on that.
library(dplyr)
df %>%
group_by(country, year) %>%
filter(if(n_distinct(version) == 2) version == 'versionA' else TRUE)
# country year version
# <fct> <dbl> <fct>
#1 country1 2011 versionA
#2 country2 2011 versionA
#3 country3 2011 versionB
Upvotes: 2