Eonear Black
Eonear Black

Reputation: 43

Combining data in the same column

I'm researching the amount of counted individuals during four different sampling days for 9 different Town districts. so 4 count at 9 locations.

I was able to plot Sampling 1, 2, 3 and 4 indipendently from each other. But i have a threshold of 60 counted individuals to be able to utilise the data for futher statistics. So i have to cluster the data seeing as some samplings did not reach this threshold. THis is done by adding sampling 1 and sampling 2 of every town district together to see if adding these two sampling days results in the amount of needed individuals to get over the threshold of 60.

Now i have to add Sampling 1+2 and Sampling 3+4 together in order to create a ggplot similar to the one below but this time instead of Sampling 1, 2, 3 and 4 there sound be Sampling 1+2 and Sampling 3+4. 4 Samplings ggplot The code for the ggplot is WP+geom_point(aes(x=Sampling,y=Individuals, colour=TownDistrict))+ylab("Individuals")+xlab("Sampling")+ggtitle("Absolute amount of individuals observed over time per sampling per Town district")+scale_x_continuous(breaks = pretty_breaks(1))+scale_y_continuous(breaks = pretty_breaks(n=10))+geom_hline(yintercept = 60, colour="red") + geom_line(aes(Sampling,Individuals,colour=TownDistrict,group=TownDistrict))

The dataset Sampling is comprised of numerical value with a numeric range 1-4.

I also included my dataset to provide an overvieuw of the kind of data i'm working with. Dataset

I have tried using

install.packages("car") 
library(car) 
library(carData) 
install.packages(“forcats”) 
library(forcats) 

class(x) 
[1] "factor"  
levels(x) 
[1] "1" "2" "3" "4" 
str(x) 
 Factor w/ 4 levels "1","2","3","4": 1 2 3 4 

x 
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 
Levels: 1 2 3 4 

recode(x, "c('1', '2')='Sampling 1+2';c('3', '4') = 'Sampling 3+4'") 
[1] Sampling 1+2 Sampling 1+2 Sampling 3+4 Sampling 3+4 
Levels: Sampling 1+2 Sampling 3+4 

but none of the code seems to change Sampling 1, 2, 3 and 4 into a combination of sampling 1+2 and Sampling 3+4 per town District.

I hope i have described my problem in enough detail.

As requested by the commends

dput(WPT)
structure(list(Individuals = c(4, 11, 17, 21, 49, 68, 69, 76, 
24, 85, 69, 61, 86, 69, 86, 71, 82, 53, 83, 76, 84, 99, 99, 86, 
79, 134, 124, 112, 111, 90, 122, 104, 81, 102, 115, 95)
`Sampling = c(1, 
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), TD = c(1, 1, 1, 1, 
2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 
7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9)`, TownDistrict = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L), levels = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9"), class = "factor"), SMPL = structure(c(1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), levels = 
c("1", 
"2", "3", "4"), class = "factor")), row.names = c(NA, -36L), class = 
c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 0

Views: 84

Answers (2)

Nick Glättli
Nick Glättli

Reputation: 466

Here, you will find my approach. You use dplyr to reshape your data and summarise the samplings.

library(tidyverse)

df <- structure(list(Individuals = c(4, 11, 17, 21, 49, 68, 69, 76, 
                                     24, 85, 69, 61, 86, 69, 86, 71, 82, 53, 83, 76, 84, 99, 99, 86, 
                                     79, 134, 124, 112, 111, 90, 122, 104, 81, 102, 115, 95),
                     Sampling = c(1, 
                                   2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
                                   3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), 
                     TD = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 
                            7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9), 
                     TownDistrict = structure(c(1L,1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
                                                5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L), 
                                              levels = c("1", "2", "3", "4", "5", "6", "7", "8",  "9"), 
                                              class = "factor"), SMPL = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L,
                                                                                    4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
                                                                                    1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), 
                                                                                  levels = c("1", "2", "3", "4"), 
                                                                                  class = "factor")), row.names = c(NA, -36L),class = c("tbl_df", "tbl", "data.frame"))

df %>%
  mutate("sampling2"=
           case_when(
             Sampling %in% c(1,2) ~ "1+2",
             Sampling %in% c(3,4) ~ "3+4"
           )) %>%
  group_by(TownDistrict, sampling2) %>%
  summarise(Individuals= sum(Individuals)) %>%
  ggplot(aes(x=sampling2, y= Individuals, color= TownDistrict, group=TownDistrict))+
  geom_point()+
  geom_line()
#> `summarise()` has grouped output by 'TownDistrict'. You can override using the
#> `.groups` argument.

Created on 2023-02-28 with reprex v2.0.2

Upvotes: 0

Dimitri
Dimitri

Reputation: 135

it is usually simpler to filter your dataset before ploting it. You could :

Step 1 : create a new "Sampling" column, where you put 1+2/ 2+3 whenever you have less than 60 people.

#I use dplyr a lot

library(dplyr)
data=data %>% mutate(newSampling=case_when(Individuals>=60 ~ Sampling,
Individuals<60 & (Sampling=="1"|Sampling=="2") ~"1+2",
Individuals<60 & (Sampling=="2"|Sampling=="3") ~"3+4"))

Step 2 : Do the sum of individuals for each "NewSampling" and "TownDistrict"

data=data %>% group_by(newSampling,TownDistrict) %>% 
mutate(IndividualsSum=sum(Individuals)) %>% ungroup()

Step 3 : create a new variable to know if the group should be represented in your plot or not

data=data %>% mutate(should_be_plotted=(IndividualsSum>=60)

Step 4 : filter your data on "should_be_plotted" before plotting

data %>% filter(should_be_plotted==TRUE) %>% ggplot()
#Rest of the plot's code

Upvotes: 0

Related Questions