Reputation: 43
I'm researching the amount of counted individuals during four different sampling days for 9 different Town districts. so 4 count at 9 locations.
I was able to plot Sampling 1, 2, 3 and 4 indipendently from each other. But i have a threshold of 60 counted individuals to be able to utilise the data for futher statistics. So i have to cluster the data seeing as some samplings did not reach this threshold. THis is done by adding sampling 1 and sampling 2 of every town district together to see if adding these two sampling days results in the amount of needed individuals to get over the threshold of 60.
Now i have to add Sampling 1+2 and Sampling 3+4 together in order to create a ggplot similar to the one below but this time instead of Sampling 1, 2, 3 and 4 there sound be Sampling 1+2 and Sampling 3+4. 4 Samplings ggplot The code for the ggplot is WP+geom_point(aes(x=Sampling,y=Individuals, colour=TownDistrict))+ylab("Individuals")+xlab("Sampling")+ggtitle("Absolute amount of individuals observed over time per sampling per Town district")+scale_x_continuous(breaks = pretty_breaks(1))+scale_y_continuous(breaks = pretty_breaks(n=10))+geom_hline(yintercept = 60, colour="red") + geom_line(aes(Sampling,Individuals,colour=TownDistrict,group=TownDistrict))
The dataset Sampling is comprised of numerical value with a numeric range 1-4.
I also included my dataset to provide an overvieuw of the kind of data i'm working with. Dataset
I have tried using
install.packages("car")
library(car)
library(carData)
install.packages(“forcats”)
library(forcats)
class(x)
[1] "factor"
levels(x)
[1] "1" "2" "3" "4"
str(x)
Factor w/ 4 levels "1","2","3","4": 1 2 3 4
x
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Levels: 1 2 3 4
recode(x, "c('1', '2')='Sampling 1+2';c('3', '4') = 'Sampling 3+4'")
[1] Sampling 1+2 Sampling 1+2 Sampling 3+4 Sampling 3+4
Levels: Sampling 1+2 Sampling 3+4
but none of the code seems to change Sampling 1, 2, 3 and 4 into a combination of sampling 1+2 and Sampling 3+4 per town District.
I hope i have described my problem in enough detail.
As requested by the commends
dput(WPT)
structure(list(Individuals = c(4, 11, 17, 21, 49, 68, 69, 76,
24, 85, 69, 61, 86, 69, 86, 71, 82, 53, 83, 76, 84, 99, 99, 86,
79, 134, 124, 112, 111, 90, 122, 104, 81, 102, 115, 95)
`Sampling = c(1,
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), TD = c(1, 1, 1, 1,
2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7,
7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9)`, TownDistrict = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L), levels = c("1", "2", "3", "4", "5", "6", "7", "8",
"9"), class = "factor"), SMPL = structure(c(1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), levels =
c("1",
"2", "3", "4"), class = "factor")), row.names = c(NA, -36L), class =
c("tbl_df",
"tbl", "data.frame"))
Upvotes: 0
Views: 84
Reputation: 466
Here, you will find my approach. You use dplyr
to reshape your data and summarise the samplings.
library(tidyverse)
df <- structure(list(Individuals = c(4, 11, 17, 21, 49, 68, 69, 76,
24, 85, 69, 61, 86, 69, 86, 71, 82, 53, 83, 76, 84, 99, 99, 86,
79, 134, 124, 112, 111, 90, 122, 104, 81, 102, 115, 95),
Sampling = c(1,
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
TD = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7,
7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9),
TownDistrict = structure(c(1L,1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L),
levels = c("1", "2", "3", "4", "5", "6", "7", "8", "9"),
class = "factor"), SMPL = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
levels = c("1", "2", "3", "4"),
class = "factor")), row.names = c(NA, -36L),class = c("tbl_df", "tbl", "data.frame"))
df %>%
mutate("sampling2"=
case_when(
Sampling %in% c(1,2) ~ "1+2",
Sampling %in% c(3,4) ~ "3+4"
)) %>%
group_by(TownDistrict, sampling2) %>%
summarise(Individuals= sum(Individuals)) %>%
ggplot(aes(x=sampling2, y= Individuals, color= TownDistrict, group=TownDistrict))+
geom_point()+
geom_line()
#> `summarise()` has grouped output by 'TownDistrict'. You can override using the
#> `.groups` argument.
Created on 2023-02-28 with reprex v2.0.2
Upvotes: 0
Reputation: 135
it is usually simpler to filter your dataset before ploting it. You could :
Step 1 : create a new "Sampling" column, where you put 1+2/ 2+3 whenever you have less than 60 people.
#I use dplyr a lot
library(dplyr)
data=data %>% mutate(newSampling=case_when(Individuals>=60 ~ Sampling,
Individuals<60 & (Sampling=="1"|Sampling=="2") ~"1+2",
Individuals<60 & (Sampling=="2"|Sampling=="3") ~"3+4"))
Step 2 : Do the sum of individuals for each "NewSampling" and "TownDistrict"
data=data %>% group_by(newSampling,TownDistrict) %>%
mutate(IndividualsSum=sum(Individuals)) %>% ungroup()
Step 3 : create a new variable to know if the group should be represented in your plot or not
data=data %>% mutate(should_be_plotted=(IndividualsSum>=60)
Step 4 : filter your data on "should_be_plotted" before plotting
data %>% filter(should_be_plotted==TRUE) %>% ggplot()
#Rest of the plot's code
Upvotes: 0