Share
Share

Reputation: 505

Replacing column names from another dataframe with missing column names in R?

I have two data-sets:

a. A data frame that looks like this:

        SpeciesA  SpeciesB  SpeciesC  SpeciesD  SpeciesE  SpeciesY SpeciesZ
Site1     1          0        4        6          2        5        2
Site2     1          0        4        6          2        5        3
Site3     1          0        4        6          2        5        4
Site4     1          0        4        6          2        5        5

(Note: The row values are NOT identical. This is just for the purpose of representation here)

b. Another data-set that looks like this:

Order          Species
Order1         SpeciesA
Order1         SpeciesB
Order2         SpeciesC
Order2         SpeciesD
Order3         SpeciesE

Notice that some species (eg. SpeciesZ, SpeciesY) does not have a corresponding "Order" in (b).

I want to match the Order column in data-set (2) to the corresponding Species in data-frame(1) and add up the values (if there are multiple species) under the same Order. When there is no corresponding Order for the species (eg.SpeciesY, SpeciesZ), I want to rename the column as NULL and then remove it (Note I need both the steps)

Pre-Final output

         Order1    Order1   Order3    NULL   NULL
Site1     1          10       2        5     2
Site2     1          10       2        5     3
Site3     1          10       2        5     4
Site4     1          10       2        5     5

Final output

         Order1      Order2    Order3   
Site1     1             10          2                
Site2     1             10          2                    
Site3     1             10          2                    
Site4     1             10          2            

This is an extension of the question asked here, which has code for the most part using dplyr or melt/reshape functions. However, I am finding it difficult to perform the same analysis for this as it would spit out an error saying that there are no corresponding values for some species

Upvotes: 2

Views: 794

Answers (1)

akrun
akrun

Reputation: 887851

One way is to create an index with match, split the subset of dataset using the Order column, loop through the list elements and get the rowSums

i1 <- match(colnames(df1), df2$Species, nomatch = 0)
data.frame(lapply(split.default(df1[i1], df2$Order[i1]), rowSums))
#      Order1 Order2 Order3
#Site1      1     10      2
#Site2      1     10      2
#Site3      1     10      2
#Site4      1     10      2

Or we can convert the first dataset to 'long' format, join with the second, grouped by columns, get the sum of the value column and reshape it back to 'wide'

library(tidyverse)
rownames_to_column(df1, "rn") %>%
      gather(Species, Val, -rn) %>% 
      left_join(., df2, by = "Species") %>%
      na.omit() %>%
      group_by(rn, Order) %>% 
      summarise(Val = sum(Val))  %>%
      spread(Order, Val)

Upvotes: 2

Related Questions