starski
starski

Reputation: 369

How to sum values from one column based on specific conditions from other column in R?

I have a dataset that looks something like this:

df <- data.frame(plot = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
                 species = c("Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies"),
                 value =  sample(100, size = 15, replace = TRUE)) 

head(df)
  plot species value
1    A   Fagus    53
2    A   Fagus    48
3    A Quercus     5
4    A   Picea    25
5    A   Abies    12
6    B   Fagus    12

Now, I want to create a new data frame containing per plot values for share.conifers and share.broadleaves by basically summing the values with conditions applied for species. I thought about using case_when but I am not sure how to write the syntax:

df1 <- df %>% share.broadleaves = case_when(plot = plot & species = "Fagus" or species = "Quercus" ~ FUN="sum")

df1 <- df %>% share.conifers = case_when(plot = plot & species = "Abies" or species = "Picea" ~ FUN="sum")

I know this is not right, but I would like something like this.

Upvotes: 0

Views: 56

Answers (2)

harre
harre

Reputation: 7307

Using dplyr/tidyr:

First construct the group, do the calculation and then spread into columns.

library(dplyr)
library(tidyr)

df |>
  mutate(type = case_when(species %in% c("Fagus", "Quercus") ~ "broadleaves",
                          species %in% c("Abies", "Picea") ~ "conifers")) |>
  group_by(plot, type) |>
  summarise(share = sum(value)) |>
  ungroup() |>
  pivot_wider(values_from = "share", names_from = "type", names_prefix = "share.")

Output:

# A tibble: 3 × 3
  plot  share.broadleaves share.conifers
  <chr>             <int>          <int>
1 A                   159             77
2 B                    53             42
3 C                   204             63

I am not sure if you want to sum or get the share, but the code could easily be adapted to whatever goal you have.

Upvotes: 1

deschen
deschen

Reputation: 11026

One way could just be summarizing by plot and species:

library(dplyr)
df |>
  group_by(plot, species) |>
  summarize(share = sum(value))

If you really want to get the share of a specific species per plot you could also do:

df |>
  group_by(plot) |>
  summarize(share_certain_species = sum(value[species %in% c("Fagus", "Quercus")]) / sum(value))

which gives:

# A tibble: 3 × 2
  plot  share_certain_species
  <chr>                 <dbl>
1 A                     0.546
2 B                     0.583
3 C                     0.480

Upvotes: 0

Related Questions