Reputation: 13
I have a dataset whereby some hex_id's are detected both us (upstream) and ds (downstream). There are about 500 fish that were detected on both us and ds locations. Im trying to subset column hex_id so it returns only the rows where hex_id == us and hex_id == ds.
hex_id loc det#
3D9.1C2D9B1656 us 1
3D9.1C2D9B1C20 us 2
3D9.1C2D9B1C91 us 1
3D9.1C2D9B2110 ds 1
3D9.1C2D9B2226 ds 1
3D9.1C2D9B2CA7 us 19
3D9.1C2D9B3128 us 2
3D9.1C2D9B376C us 2
3D9.1C2D9B39B1 us 1
3D9.1C2D9B3A1E us 5
3D9.1C2D9B4717 ds 1
3D9.1C2D9B4717 us 1
3D9.1C2D9B4731 ds 1
3D9.1C2D9B4740 us 25
I've been trying this:
pit_both <- pitdata %>%
filter(down == "ds" & up == "us") %>%
summarise(fish = n_distinct(hex_id))
And getting this:
pit_both
fish
1 0
I think it's because each detection has its own line, so no where in the datasheet does a singular row equal ds and us at the same time, its the hex_id that I need to equal up and ds.
Any suggestions??
Upvotes: 1
Views: 41
Reputation:
Another approach is to pivot out the det
column (I renamed det#
to make it a valid column name) by loc
, so that it is easier to filter. All depends on how you want to use the data.
library(dplyr)
library(tidyr)
pitdata %>%
pivot_wider(names_from = loc, values_from = det) %>%
filter(!is.na(us), !is.na(ds))
# # A tibble: 1 x 3
# hex_id us ds
# <fct> <int> <int>
# 1 3D9.1C2D9B4717 1 1
Upvotes: 1
Reputation: 24888
The reason your approach isn't working is because none of the rows of loc
are both "us"
AND "ds"
.
Here's an approach with dplyr
. First group_by(hex_id)
and then use filter
to check that loc
contains both "us"
and "ds"
for that group.
pitdata %>%
group_by(hex_id) %>%
dplyr::filter(any(loc == "us") & any(loc == "ds"))
## A tibble: 2 x 3
## Groups: hex_id [1]
# hex_id loc det.
# <chr> <chr> <int>
#1 3D9.1C2D9B4717 ds 1
#2 3D9.1C2D9B4717 us 1
From here, it's easy to count fish:
pitdata %>%
group_by(hex_id) %>%
dplyr::filter(any(loc == "us") & any(loc == "ds")) %>%
ungroup %>%
summarise(fish = n_distinct(hex_id))
# A tibble: 1 x 1
fish
<int>
1 1
Upvotes: 0
Reputation: 731
The following creates two subsets of your data for "us" and "ds" then finds all of the rows that have a matching hex_id.
library(dplyr)
inner_join(
pitdata %>% filter(loc == "us"),
pitdata %>% filter(loc == "ds"),
by = "hex_id")
The output would be something like:
hex_id loc.x det.x loc.y det.y
3D9.1C2D9B4717 us 1 ds 1
If you wanted it back into the original format, you can pivot the table.
Upvotes: 0