Reputation: 337
I am attempting to filter coral demographic data in a time series. I have a set of corals that have been measured every 3 months. What I want to do is a.) filter for all corals that at some point had a maximum diameter of 9, 10, or 11 mm and b.) remove corals that were previously larger than 9, 10, or 11 mm in previous coral censuses. It's important to note that I do want to also filter for corals that are within the size range and in the next TimeStep, did not leave the 9-11 mm range because this constitutes 0 growth and I want to include those corals too.
I have created a sample database to work with. Colony # 1 is an example of a coral that grew past the size range (9-11 mm) and then shrank back to 9. Colony # 1 I want to be removed from the database completely.
Colony # 2 started out past the desired size range (9-11 mm) then shrank to within the range later. I also want this coral removed because I need to ensure corals that are within the size range did not shrink to it but grew to it.
Colony # 3 is an example of a coral that grew to the size range (9-11 mm) and beyond without shrinking and this is a coral that I want to keep because it grew to the size range.
Colony # 4 is an example of a coral that started off above the size range and therefore needs to be removed.
Colony # 5 is an example of a coral that started below the range, grew into it, then later shrank back into the range. For this scenario, I want to only include the first time the diameter fell into the range not the second time. This is because the first time is natural growth whereas the second time is shrinkage and its resulting recovery (which I want to exclude or filter out).
Colony # 6 is an example of a coral that started in the size range for TimeStep 1 and then grew out of it in the next TimeStep and continued to grow after. I want to maintain all measurements proceeding the first TimeStep in this instance so that I can calculate growth between TimeStep 1 and 2.
Colony # 7 is an example of a coral that started in the size range in TimeStep 1 and then remained in the range for TimeStep 2. In this case (assuming that the coral does not shrink back to the size range later), I want to keep all measurements proceeding TimeStep 1 and 2. This is a case where the coral had 0 growth from when it was first in the range and I want to include these corals in this database for analysis.
Colony # 8 is an example of a coral that grew to the size range in TimeStep 3, stayed in the range (10 => 9) in TimeStep 4, then shrank below the desired range then for TimeStep 6 grew back to the range. For this colony, I want TimeStep 4 included for this coral because the coral is considered the same size between TimeStep 3 and 4 (because the size is still within the range of measurement error).
Colony # 9 is an example of a coral that grew to the size range in TimeStep 3, stayed in it in TimeStep 4 (10 => 9), and then grew above the range in TimeStep 5 and for TimeStep 6. As such, this coral should have ALL measurements (TimeStep 1-6) included in the database because this coral never shrank.
Colony # 10 is an example of a coral that grew to the size range, stayed in it in TimeStep 4, then shrank below the range in TimeStep 5 then grew beyond it in TimeStep 6. In this case, I want to include TimeStep 5 because I want to have a measure of shrinkage from the size range. As such, only TimeStep 6 should be filtered out since the coral shrank below the size range (9 - 11 mm).
All told, I want code that filters this database such that if a coral at some point has a diameter of 9-11 mm but was previously larger than that range, was never at or below the range, or started below the range and never fell within it, they are removed from the database entirely. Also, I am looking to keep any corals that grew to the range and then shrank back to it in the database while removing the second time it fell in the range. I am looking for a general code form to be able to filter out these cases such that all corals in the database started below 9-11 mm and then grew into that range. Thank you for your time!
Data <- structure(list(Site = c("WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI"
), `Module #` = c(116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116),
Side = c("N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N"), TimeStep = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5,
6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6,
1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1,
2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6), Settlement_Area = c(0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336), `Colony #` = c(1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7,
7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10,
10, 10, 10), Location = c("C1", "C1", "C1", "C1", "C1", "C1",
"B1", "B1", "B1", "B1", "B1", "B1", "A1", "A1", "A1", "A1",
"A1", "A1", "D1", "D1", "D1", "D1", "D1", "D1", "D1", "D1",
"D1", "D1", "D1", "D1", "A2", "A2", "A2", "A2", "A2", "A2",
"A4", "A4", "A4", "A4", "A4", "A4", "B3", "B3", "B3", "B3",
"B3", "B3", "C2", "C2", "C2", "C2", "C2", "C2", "B4", "B4",
"B4", "B4", "B4", "B4"), `Taxonomic Code` = c("PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC"), `Cover Code` = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), `Max Diameter (cm)` = c(5, 8, 12, 15, 9, 16, 15, 13,
11, 15, 17, 20, 3, 6, 9, 12, 15, 20, 13, 16, 24, 22, 28,
30, 6, 9, 14, 9, 15, 19, 11, 14, 17, 17, 21, 24, 9, 11, 14,
16, 20, 22, 3, 6, 10, 9, 7, 10, 5, 7, 10, 9, 13, 16, 5, 7,
9, 10, 8, 13)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -60L), spec = structure(list(
cols = list(Site = structure(list(), class = c("collector_character",
"collector")), `Module #` = structure(list(), class = c("collector_double",
"collector")), Side = structure(list(), class = c("collector_character",
"collector")), TimeStep = structure(list(), class = c("collector_double",
"collector")), Settlement_Area = structure(list(), class = c("collector_double",
"collector")), `Colony #` = structure(list(), class = c("collector_double",
"collector")), Location = structure(list(), class = c("collector_character",
"collector")), `Taxonomic Code` = structure(list(), class = c("collector_character",
"collector")), `Cover Code` = structure(list(), class = c("collector_double",
"collector")), `Max Diameter (cm)` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
Data_2 <- structure(list(Site = c("WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI",
"WAI", "WAI", "WAI", "WAI", "WAI", "WAI", "WAI"), `Module #` = c(116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116,
116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116, 116
), Side = c("N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",
"N", "N", "N", "N"), TimeStep = c(1, 2, 3, 4, 1, 2, 3, 4, 5,
6, 1, 2, 3, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 1,
2, 3, 4, 5, 6, 1, 2, 3, 4, 5), Settlement_Area = c(0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336, 0.75902336,
0.75902336, 0.75902336, 0.75902336), `Colony #` = c(1, 1, 1,
1, 3, 3, 3, 3, 3, 3, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7,
7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10), Location = c("C1",
"C1", "C1", "C1", "A1", "A1", "A1", "A1", "A1", "A1", "D1", "D1",
"D1", "A2", "A2", "A2", "A2", "A2", "A2", "A4", "A4", "A4", "A4",
"A4", "A4", "B3", "B3", "B3", "B3", "C2", "C2", "C2", "C2", "C2",
"C2", "B4", "B4", "B4", "B4", "B4"), `Taxonomic Code` = c("PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC", "PC",
"PC", "PC", "PC", "PC", "PC", "PC"), `Cover Code` = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), `Max Diameter (cm)` = c(5,
8, 12, 15, 3, 6, 9, 12, 15, 20, 6, 9, 14, 11, 14, 17, 17, 21,
24, 9, 11, 14, 16, 20, 22, 3, 6, 10, 9, 5, 7, 10, 9, 13, 16,
5, 7, 9, 10, 8)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -40L), spec = structure(list(cols = list(
Site = structure(list(), class = c("collector_character",
"collector")), `Module #` = structure(list(), class = c("collector_double",
"collector")), Side = structure(list(), class = c("collector_character",
"collector")), TimeStep = structure(list(), class = c("collector_double",
"collector")), Settlement_Area = structure(list(), class = c("collector_double",
"collector")), `Colony #` = structure(list(), class = c("collector_double",
"collector")), Location = structure(list(), class = c("collector_character",
"collector")), `Taxonomic Code` = structure(list(), class = c("collector_character",
"collector")), `Cover Code` = structure(list(), class = c("collector_double",
"collector")), `Max Diameter (cm)` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
Upvotes: 0
Views: 103
Reputation: 607
EDIT #2: Just to summarize your filter-criterions as far as I understood them:
As you keep coral #1 in your desired output I assume it doesn't really matter whether a coral grows into your desired range or directly jumps over it. Is that correct?
CODE:
Data_filtered <- Data %>%
group_by(`Colony #`) %>%
filter(any((TimeStep == 1 & `Max Diameter (cm)` < 12)), # criterion 1
!all(`Max Diameter (cm)` < 9), # criterion 2
row_number() + 1 <= min(which(lag(`Max Diameter (cm)`) > `Max Diameter (cm)`))) # criterion 3
# test whether the filtering worked ok
all_equal(Data_filtered, Data_2)
[1] TRUE
This now results in the same filtered data frame as your desired output data frame (thanks for that - it makes things a lot easier!).
Upvotes: 1