Reputation: 505
I have two data-sets:
a. A data frame that looks like this:
SpeciesA SpeciesB SpeciesC SpeciesD SpeciesE SpeciesY SpeciesZ
Site1 1 0 4 6 2 5 2
Site2 1 0 4 6 2 5 3
Site3 1 0 4 6 2 5 4
Site4 1 0 4 6 2 5 5
(Note: The row values are NOT identical. This is just for the purpose of representation here)
b. Another data-set that looks like this:
Order Species
Order1 SpeciesA
Order1 SpeciesB
Order2 SpeciesC
Order2 SpeciesD
Order3 SpeciesE
Notice that some species (eg. SpeciesZ, SpeciesY) does not have a corresponding "Order" in (b).
I want to match the Order column in data-set (2) to the corresponding Species in data-frame(1) and add up the values (if there are multiple species) under the same Order. When there is no corresponding Order for the species (eg.SpeciesY, SpeciesZ), I want to rename the column as NULL and then remove it (Note I need both the steps)
Pre-Final output
Order1 Order1 Order3 NULL NULL
Site1 1 10 2 5 2
Site2 1 10 2 5 3
Site3 1 10 2 5 4
Site4 1 10 2 5 5
Final output
Order1 Order2 Order3
Site1 1 10 2
Site2 1 10 2
Site3 1 10 2
Site4 1 10 2
This is an extension of the question asked here, which has code for the most part using dplyr
or melt
/reshape
functions. However, I am finding it difficult to perform the same analysis for this as it would spit out an error saying that there are no corresponding values for some species
Upvotes: 2
Views: 794
Reputation: 887851
One way is to create an index with match
, split
the subset of dataset using the Order
column, loop through the list
elements and get the rowSums
i1 <- match(colnames(df1), df2$Species, nomatch = 0)
data.frame(lapply(split.default(df1[i1], df2$Order[i1]), rowSums))
# Order1 Order2 Order3
#Site1 1 10 2
#Site2 1 10 2
#Site3 1 10 2
#Site4 1 10 2
Or we can convert the first dataset to 'long' format, join with the second, grouped by columns, get the sum
of the value column and reshape it back to 'wide'
library(tidyverse)
rownames_to_column(df1, "rn") %>%
gather(Species, Val, -rn) %>%
left_join(., df2, by = "Species") %>%
na.omit() %>%
group_by(rn, Order) %>%
summarise(Val = sum(Val)) %>%
spread(Order, Val)
Upvotes: 2