Valentina
Valentina

Reputation: 51

Grouping and subsetting with multiple conditions

I want to subset all this long-format dataset df for "Ca" and sum the percentages of the others elements into a new variable "other_elem" within the same column "Elements".

 tibble [225 × 8] (S3: tbl_df/tbl/data.frame)
 $ site         : chr [1:225] "Nahal_Rahaf_2" "Nahal_Rahaf_2" "Nahal_Rahaf_2" "Nahal_Rahaf_2" ...
 $ site_type    : chr [1:225] "Rockshelter" "Rockshelter" "Rockshelter" "Rockshelter" ...
 $ aDNA_presence: chr [1:225] "No" "No" "No" "No" ...
 $ aDNA_quantity: chr [1:225] "No" "No" "No" "No" ...
 $ sample       : chr [1:225] "s1" "s1" "s1" "s1" ...
 $ Input_mg     : num [1:225] 0 0 0 0 0 0 0 0 0 0 ...
 $ Elements     : chr [1:225] "Mg" "Al" "Si" "P" ...
 $ Percentages  : num [1:225] 4.61 4.205 26.095 0.735 1.795 ...


head(df)

site          site_type   aDNA_presence aDNA_quantity sample Input_mg Elements Percentages
  <chr>         <chr>       <chr>         <chr>         <chr>     <dbl> <chr>          <dbl>
1 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Mg             4.61 
2 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Al             4.20 
3 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Si            26.1  
4 Nahal_Rahaf_2 Rockshelter No            No            s1            0 P              0.735
5 Nahal_Rahaf_2 Rockshelter No            No            s1            0 K              1.80 
6 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Ca            52.1  

I have started writing the code but I get stuck to here:

group_by(Elements) %>% 
  summarise(Percentages = sum(Percentages))

Upvotes: 0

Views: 76

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35604

You can group the data by all the columns except Percentages, overwrite Elements column by collapsing factors other than "Ca" into "Others", and then summarize each group with sum().

library(dplyr)

df %>%
  group_by(pick(-Percentages),
           Elements = replace(Elements, Elements != "Ca", "Others")) %>%
  summarise(Percentages = sum(Percentages), .groups = 'drop')

# # A tibble: 2 × 8
#   site          site_type   aDNA_presence aDNA_quantity sample Input_mg Elements  Percentages
#   <chr>         <chr>       <chr>         <chr>         <chr>     <int> <chr>           <dbl>
# 1 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Ca               52.1
# 2 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Others           37.4

Note: If your dplyr version is older than 1.1.0, replace pick(-Percentages) with across(-Percentages).


Data
df <- read.table(text = "
  site          site_type   aDNA_presence aDNA_quantity sample Input_mg Elements Percentages
1 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Mg             4.61 
2 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Al             4.20 
3 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Si            26.1  
4 Nahal_Rahaf_2 Rockshelter No            No            s1            0 P              0.735
5 Nahal_Rahaf_2 Rockshelter No            No            s1            0 K              1.80 
6 Nahal_Rahaf_2 Rockshelter No            No            s1            0 Ca            52.1")

Upvotes: 1

Talha Asif
Talha Asif

Reputation: 401

To subset the data for "Ca" and sum the percentages of other elements into a new variable "other_elem", you can use the following code:

Data:

df <- data.frame(
  site = c("Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2"),
  site_type = c("Rockshelter", "Rockshelter", "Rockshelter", "Rockshelter", "Rockshelter", "Rockshelter"),
  aDNA_presence = c("No", "No", "No", "No", "No", "No"),
  aDNA_quantity = c("No", "No", "No", "No", "No", "No"),
  sample = c("s1", "s1", "s1", "s1", "s1", "s1"),
  Input_mg = c(0, 0, 0, 0, 0, 0),
  Elements = c("Mg", "Al", "Si", "P", "K", "Ca"),
  Percentages = c(4.61, 4.205, 26.095, 0.735, 1.795, 52.1)
)

Code:

df %>% 
  filter(Elements != "Ca") %>% 
  group_by(site, sample) %>% 
  summarize(other_elem = sum(Percentages)) %>% 
  full_join(df %>% filter(Elements == "Ca"), by = c("site", "sample")) %>% 
  arrange(site, sample)

Output:

# Groups:   site [1]
  site          sample other_elem site_type   aDNA_presence aDNA_quantity Input_mg Elements Percentages
  <chr>         <chr>       <dbl> <chr>       <chr>         <chr>            <dbl> <chr>          <dbl>
1 Nahal_Rahaf_2 s1           37.4 Rockshelter No            No                   0 Ca              52.1

Upvotes: 1

Related Questions