Manipulating a dataframe (with R)

Question

I have a difficult aim to reach in order to facilitate my analyses; to the best of my knowledge there are no similar questions. I have a very long dataframe in Excel, which I reproduce here - in a simpler form - in R environment:

A1 <- cbind("sp1","sp2","sp3", "sp4", "sp7", "sp8") 
A2 <- cbind("sp1","sp3", "sp4", "sp7", "sp9") 
A3 <- cbind("sp5","sp6","sp7", "sp10") 
A4 <- cbind("sp1","sp2","sp7", "sp9", "sp10") 
A5 <- cbind("sp3","sp4") 

max_row <- 6

A1 <- c(A1, rep(NA, max_row - length(A1)))
A2 <- c(A2, rep(NA, max_row - length(A2))) 
A3 <- c(A3, rep(NA, max_row - length(A3))) 
A4 <- c(A4, rep(NA, max_row - length(A4))) 
A5 <- c(A5, rep(NA, max_row - length(A5))) 
df <-cbind(A1,A2, A3, A4, A5)
df <- as.data.frame(df)
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)

To better understand the context in which I work, 'sp' are species, and A* are the sites where I detected a given species.

I want to convert this dataframe to another one structured as follow:

The dataframe I want to obtain in an automated way

The first column contains the names of the sites, and the following ones are all the species names (obviously, repeated only one time). Then, I need to assign '1' for the presence, and '0' for the absence in a given site.

I spent many many hours to try to reach my aim, but it is a problem too complex for my R syntax capacities.

Anyone could kindly help me?

MKR · Accepted Answer

You can gather your data in long format to process and add column showing presence of species on a site. Afterwards use reshape2::dcast to spread the data in wide format as:

library(tidyverse)
library(reshape2)

df %>% gather(Site, Species) %>%
  filter(!is.na(Species)) %>%
  mutate(value = 1) %>%      #Species are present on a site
  dcast(Site~Species, value.var = "value", fill = 0)

#   Site sp1 sp10 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9
# 1   A1   1    0   1   1   1   0   0   1   1   0
# 2   A2   1    0   0   1   1   0   0   1   0   1
# 3   A3   0    1   0   0   0   1   1   1   0   0
# 4   A4   1    1   1   0   0   0   0   1   0   1
# 5   A5   0    0   0   1   1   0   0   0   0   0

Manipulating a dataframe (with R)

Answers (2)

Related Questions