Marco
Marco

Reputation: 135

Manipulating a dataframe (with R)

I have a difficult aim to reach in order to facilitate my analyses; to the best of my knowledge there are no similar questions. I have a very long dataframe in Excel, which I reproduce here - in a simpler form - in R environment:

A1 <- cbind("sp1","sp2","sp3", "sp4", "sp7", "sp8") 
A2 <- cbind("sp1","sp3", "sp4", "sp7", "sp9") 
A3 <- cbind("sp5","sp6","sp7", "sp10") 
A4 <- cbind("sp1","sp2","sp7", "sp9", "sp10") 
A5 <- cbind("sp3","sp4") 

max_row <- 6

A1 <- c(A1, rep(NA, max_row - length(A1)))
A2 <- c(A2, rep(NA, max_row - length(A2))) 
A3 <- c(A3, rep(NA, max_row - length(A3))) 
A4 <- c(A4, rep(NA, max_row - length(A4))) 
A5 <- c(A5, rep(NA, max_row - length(A5))) 
df <-cbind(A1,A2, A3, A4, A5)
df <- as.data.frame(df)
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)

To better understand the context in which I work, 'sp' are species, and A* are the sites where I detected a given species.

I want to convert this dataframe to another one structured as follow:

The dataframe I want to obtain in an automated way

The first column contains the names of the sites, and the following ones are all the species names (obviously, repeated only one time). Then, I need to assign '1' for the presence, and '0' for the absence in a given site.

I spent many many hours to try to reach my aim, but it is a problem too complex for my R syntax capacities.

Anyone could kindly help me?

Upvotes: 2

Views: 65

Answers (2)

andrew_reece
andrew_reece

Reputation: 21284

You can use gather and spread from tidyverse:

library(tidyverse)

df %>%
  gather(A, sp) %>%
  filter(!is.na(sp)) %>%
  group_by(A, sp) %>%
  count() %>%
  spread(sp, n) %>%
  replace(., is.na(.), 0)

  # A tibble: 5 x 11
# Groups:   A [5]
  A       sp1  sp10   sp2   sp3   sp4   sp5   sp6   sp7   sp8   sp9
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1       1.    0.    1.    1.    1.    0.    0.    1.    1.    0.
2 A2       1.    0.    0.    1.    1.    0.    0.    1.    0.    1.
3 A3       0.    1.    0.    0.    0.    1.    1.    1.    0.    0.
4 A4       1.    1.    1.    0.    0.    0.    0.    1.    0.    1.
5 A5       0.    0.    0.    1.    1.    0.    0.    0.    0.    0.

Upvotes: 2

MKR
MKR

Reputation: 20095

You can gather your data in long format to process and add column showing presence of species on a site. Afterwards use reshape2::dcast to spread the data in wide format as:

library(tidyverse)
library(reshape2)

df %>% gather(Site, Species) %>%
  filter(!is.na(Species)) %>%
  mutate(value = 1) %>%      #Species are present on a site
  dcast(Site~Species, value.var = "value", fill = 0)

#   Site sp1 sp10 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9
# 1   A1   1    0   1   1   1   0   0   1   1   0
# 2   A2   1    0   0   1   1   0   0   1   0   1
# 3   A3   0    1   0   0   0   1   1   1   0   0
# 4   A4   1    1   1   0   0   0   0   1   0   1
# 5   A5   0    0   0   1   1   0   0   0   0   0  

Upvotes: 2

Related Questions