Reputation: 51
I want to subset all this long-format dataset df for "Ca" and sum the percentages of the others elements into a new variable "other_elem" within the same column "Elements".
tibble [225 × 8] (S3: tbl_df/tbl/data.frame)
$ site : chr [1:225] "Nahal_Rahaf_2" "Nahal_Rahaf_2" "Nahal_Rahaf_2" "Nahal_Rahaf_2" ...
$ site_type : chr [1:225] "Rockshelter" "Rockshelter" "Rockshelter" "Rockshelter" ...
$ aDNA_presence: chr [1:225] "No" "No" "No" "No" ...
$ aDNA_quantity: chr [1:225] "No" "No" "No" "No" ...
$ sample : chr [1:225] "s1" "s1" "s1" "s1" ...
$ Input_mg : num [1:225] 0 0 0 0 0 0 0 0 0 0 ...
$ Elements : chr [1:225] "Mg" "Al" "Si" "P" ...
$ Percentages : num [1:225] 4.61 4.205 26.095 0.735 1.795 ...
head(df)
site site_type aDNA_presence aDNA_quantity sample Input_mg Elements Percentages
<chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl>
1 Nahal_Rahaf_2 Rockshelter No No s1 0 Mg 4.61
2 Nahal_Rahaf_2 Rockshelter No No s1 0 Al 4.20
3 Nahal_Rahaf_2 Rockshelter No No s1 0 Si 26.1
4 Nahal_Rahaf_2 Rockshelter No No s1 0 P 0.735
5 Nahal_Rahaf_2 Rockshelter No No s1 0 K 1.80
6 Nahal_Rahaf_2 Rockshelter No No s1 0 Ca 52.1
I have started writing the code but I get stuck to here:
group_by(Elements) %>%
summarise(Percentages = sum(Percentages))
Upvotes: 0
Views: 76
Reputation: 35604
You can group the data by all the columns except Percentages
, overwrite Elements
column by collapsing factors other than "Ca"
into "Others"
, and then summarize each group with sum()
.
library(dplyr)
df %>%
group_by(pick(-Percentages),
Elements = replace(Elements, Elements != "Ca", "Others")) %>%
summarise(Percentages = sum(Percentages), .groups = 'drop')
# # A tibble: 2 × 8
# site site_type aDNA_presence aDNA_quantity sample Input_mg Elements Percentages
# <chr> <chr> <chr> <chr> <chr> <int> <chr> <dbl>
# 1 Nahal_Rahaf_2 Rockshelter No No s1 0 Ca 52.1
# 2 Nahal_Rahaf_2 Rockshelter No No s1 0 Others 37.4
Note: If your dplyr
version is older than 1.1.0
, replace pick(-Percentages)
with across(-Percentages)
.
df <- read.table(text = "
site site_type aDNA_presence aDNA_quantity sample Input_mg Elements Percentages
1 Nahal_Rahaf_2 Rockshelter No No s1 0 Mg 4.61
2 Nahal_Rahaf_2 Rockshelter No No s1 0 Al 4.20
3 Nahal_Rahaf_2 Rockshelter No No s1 0 Si 26.1
4 Nahal_Rahaf_2 Rockshelter No No s1 0 P 0.735
5 Nahal_Rahaf_2 Rockshelter No No s1 0 K 1.80
6 Nahal_Rahaf_2 Rockshelter No No s1 0 Ca 52.1")
Upvotes: 1
Reputation: 401
To subset the data for "Ca" and sum the percentages of other elements into a new variable "other_elem", you can use the following code:
Data:
df <- data.frame(
site = c("Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2", "Nahal_Rahaf_2"),
site_type = c("Rockshelter", "Rockshelter", "Rockshelter", "Rockshelter", "Rockshelter", "Rockshelter"),
aDNA_presence = c("No", "No", "No", "No", "No", "No"),
aDNA_quantity = c("No", "No", "No", "No", "No", "No"),
sample = c("s1", "s1", "s1", "s1", "s1", "s1"),
Input_mg = c(0, 0, 0, 0, 0, 0),
Elements = c("Mg", "Al", "Si", "P", "K", "Ca"),
Percentages = c(4.61, 4.205, 26.095, 0.735, 1.795, 52.1)
)
Code:
df %>%
filter(Elements != "Ca") %>%
group_by(site, sample) %>%
summarize(other_elem = sum(Percentages)) %>%
full_join(df %>% filter(Elements == "Ca"), by = c("site", "sample")) %>%
arrange(site, sample)
Output:
# Groups: site [1]
site sample other_elem site_type aDNA_presence aDNA_quantity Input_mg Elements Percentages
<chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
1 Nahal_Rahaf_2 s1 37.4 Rockshelter No No 0 Ca 52.1
Upvotes: 1