Gabriel Liu
Gabriel Liu

Reputation: 45

How do I filter rows based on two values within a column?

Totally new to R and I trying to solve this using the dplyr package. I want to filter out and return countries that both have Import and Export values and view them separately. I tried a lot of methods such as select and filter but have been unable to do so.

Country Year    Quantity    Description Import/Export
A   2001    10  Frozen  Export
B   2001    50  Fresh   Import
B   2004    20  Frozen  Export
C   2003    30  Frozen  Import
C   2005    40  Fresh   Export
C   2006    60  Frozen  Import
D   2007    290 Fresh   Import

Ideally, the end result should be this:

Country Year    Quantity    Description Import/Export
B   2001    50  Fresh   Import
B   2004    20  Frozen  Export
C   2003    30  Frozen  Import
C   2005    40  Fresh   Export
C   2006    60  Frozen  Import

Upvotes: 2

Views: 119

Answers (4)

akrun
akrun

Reputation: 887901

Using data.table

library(data.table)
setDT(df)[df[, .I[all(c('Import', 'Export') %in% `Import/Export`)], Country]$V1]
   Country Year Quantity Description Import/Export
1:       B 2001       50       Fresh        Import
2:       B 2004       20      Frozen        Export
3:       C 2003       30      Frozen        Import
4:       C 2005       40       Fresh        Export
5:       C 2006       60      Frozen        Import

Upvotes: 1

GuedesBF
GuedesBF

Reputation: 9878

We can group_by() Country, then filter all groups with any "Import/Export" =='Import' and any ""Import/Export"" == 'Export'

library(dplyr)

df %>% group_by(Country) %>%
        filter(any(`Import/Export`=='Import') & 
               any(`Import/Export`=='Export')) %>%
        ungroup()

# A tibble: 5 x 5
  Country  Year Quantity Description `Import/Export`
  <chr>   <dbl>    <dbl> <chr>       <chr>          
1 B        2001       50 Fresh       Import         
2 B        2004       20 Frozen      Export         
3 C        2003       30 Frozen      Import         
4 C        2005       40 Fresh       Export         
5 C        2006       60 Frozen      Import 

data

structure(list(Country = c("A", "B", "B", "C", "C", "C", "D"), 
    Year = c(2001, 2001, 2004, 2003, 2005, 2006, 2007), Quantity = c(10, 
    50, 20, 30, 40, 60, 290), Description = c("Frozen", "Fresh", 
    "Frozen", "Frozen", "Fresh", "Frozen", "Fresh"), `Import/Export` = c("Export", 
    "Import", "Export", "Import", "Export", "Import", "Import"
    )), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"
))

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389275

Using data from @GuedesBF answer here is another dplyr way to filter groups which has both 'Import' and 'Export'.

library(dplyr)

df %>%
  group_by(Country) %>%
  filter(all(c('Import', 'Export') %in% `Import/Export`)) %>%
  ungroup()

# Country  Year Quantity Description `Import/Export`
#  <chr>   <dbl>    <dbl> <chr>       <chr>          
#1 B        2001       50 Fresh       Import         
#2 B        2004       20 Frozen      Export         
#3 C        2003       30 Frozen      Import         
#4 C        2005       40 Fresh       Export         
#5 C        2006       60 Frozen      Import         

Upvotes: 3

danlooo
danlooo

Reputation: 10637

library(tidyverse)

data <- tribble(
  ~Country, ~Year, ~Quantity, ~Description, ~`Import/Export`,
  "A", 2001, 10, "Frozen", "Export",
  "B", 2001, 50, "Fresh", "Import",
  "B", 2004, 20, "Frozen", "Export",
  "C", 2003, 30, "Frozen", "Import",
  "C", 2005, 40, "Fresh", "Export",
  "C", 2006, 60, "Frozen", "Import",
  "D", 2007, 290, "Fresh", "Import"
)
data
#> # A tibble: 7 x 5
#>   Country  Year Quantity Description `Import/Export`
#>   <chr>   <dbl>    <dbl> <chr>       <chr>          
#> 1 A        2001       10 Frozen      Export         
#> 2 B        2001       50 Fresh       Import         
#> 3 B        2004       20 Frozen      Export         
#> 4 C        2003       30 Frozen      Import         
#> 5 C        2005       40 Fresh       Export         
#> 6 C        2006       60 Frozen      Import         
#> 7 D        2007      290 Fresh       Import

selected_countries <-
  data %>%
  mutate(is_there = TRUE) %>%
  distinct(Country, `Import/Export`, is_there) %>%
  pivot_wider(names_from = "Import/Export", values_from = is_there) %>%
  filter(!is.na(Export) & !is.na(Import)) %>%
  pull(Country) %>%
  unique()
selected_countries
#> [1] "B" "C"

data %>% filter(Country %in% selected_countries)
#> # A tibble: 5 x 5
#>   Country  Year Quantity Description `Import/Export`
#>   <chr>   <dbl>    <dbl> <chr>       <chr>          
#> 1 B        2001       50 Fresh       Import         
#> 2 B        2004       20 Frozen      Export         
#> 3 C        2003       30 Frozen      Import         
#> 4 C        2005       40 Fresh       Export         
#> 5 C        2006       60 Frozen      Import

Created on 2021-09-11 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions