user14796199
user14796199

Reputation: 13

subsetting a column by two conditions in another column r

I have a dataset whereby some hex_id's are detected both us (upstream) and ds (downstream). There are about 500 fish that were detected on both us and ds locations. Im trying to subset column hex_id so it returns only the rows where hex_id == us and hex_id == ds.

hex_id         loc det#
3D9.1C2D9B1656  us  1
3D9.1C2D9B1C20  us  2
3D9.1C2D9B1C91  us  1
3D9.1C2D9B2110  ds  1
3D9.1C2D9B2226  ds  1
3D9.1C2D9B2CA7  us  19
3D9.1C2D9B3128  us  2
3D9.1C2D9B376C  us  2
3D9.1C2D9B39B1  us  1
3D9.1C2D9B3A1E  us  5
3D9.1C2D9B4717  ds  1
3D9.1C2D9B4717  us  1
3D9.1C2D9B4731  ds  1
3D9.1C2D9B4740  us  25

I've been trying this:

pit_both <- pitdata %>%
  filter(down == "ds" & up == "us") %>% 
  summarise(fish = n_distinct(hex_id))

And getting this:

pit_both
  fish
1    0

I think it's because each detection has its own line, so no where in the datasheet does a singular row equal ds and us at the same time, its the hex_id that I need to equal up and ds.

Any suggestions??

Upvotes: 1

Views: 41

Answers (3)

user10917479
user10917479

Reputation:

Another approach is to pivot out the det column (I renamed det# to make it a valid column name) by loc, so that it is easier to filter. All depends on how you want to use the data.

library(dplyr)
library(tidyr)

pitdata %>% 
  pivot_wider(names_from = loc, values_from = det) %>% 
  filter(!is.na(us), !is.na(ds))

# # A tibble: 1 x 3
#     hex_id            us    ds
#     <fct>          <int> <int>
#   1 3D9.1C2D9B4717     1     1

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24888

The reason your approach isn't working is because none of the rows of loc are both "us" AND "ds".

Here's an approach with dplyr. First group_by(hex_id) and then use filter to check that loc contains both "us" and "ds" for that group.

pitdata %>%
  group_by(hex_id) %>%
  dplyr::filter(any(loc == "us") & any(loc == "ds"))
## A tibble: 2 x 3
## Groups:   hex_id [1]
#  hex_id         loc    det.
#  <chr>          <chr> <int>
#1 3D9.1C2D9B4717 ds        1
#2 3D9.1C2D9B4717 us        1

From here, it's easy to count fish:

pitdata %>%
  group_by(hex_id) %>%
  dplyr::filter(any(loc == "us") & any(loc == "ds")) %>%
  ungroup %>%
  summarise(fish = n_distinct(hex_id))
# A tibble: 1 x 1
   fish
  <int>
1     1

Upvotes: 0

da11an
da11an

Reputation: 731

The following creates two subsets of your data for "us" and "ds" then finds all of the rows that have a matching hex_id.

library(dplyr)

inner_join(
  pitdata %>% filter(loc == "us"),
  pitdata %>% filter(loc == "ds"),
  by = "hex_id")

The output would be something like:

hex_id            loc.x  det.x loc.y det.y
3D9.1C2D9B4717    us     1     ds    1

If you wanted it back into the original format, you can pivot the table.

Upvotes: 0

Related Questions