Brenna
Brenna

Reputation: 13

Putting 1 and 0 (presence/absence) where row names match a factor level

I have a list of over 400 species names. I also have two separate lists of names that are present in that original 400. What I want to do is make a presence/absence (1 and 0) data frame based on the two smaller lists. The two separate lists are as factors with levels equal to their length (unique species names), so each of those two are different lengths, and both smaller than the original 400.

Here is some of the names in two of those four separate lists:

    head(col.eff1)
    [1] Agapostemon sericeus  Agapostemon texanus   Agapostemon virescens Andrena rudbeckiae    Andrena simplex      
    [6] Anthophora terminalis
    head(inhs.eff1)
    [1] Agapostemon virescens Andrena carlini       Andrena personata     Andrena rudbeckiae    Augochlora pura      
    [6] Augochlorella aurata

I have tried the following to get the names

    intersect(col.eff1,inhs.eff1) ## what names do both lists have in common
    setdiff(col.eff1,inhs.eff1) ## what names does 1 have that 2 does not
    setdiff(inhs.eff1,col.eff1) ## what names does 2 have that 1 does not

These all work once I sort the lists before executing the code, but it only gives me the names.

But I need to make a single data frame, with 400 rows (rownames as species names), and two columns for the presence of species in collections and in sampling (inhs. and col.). **I am doing this whole thing four different times too. Any help is appreciated, thank you!

Upvotes: 0

Views: 686

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

I would also use %in% like the_darkside did, but with a different approach.
First, the data.

col.eff1 <-
c("Agapostemon sericeus", "Agapostemon texanus", "Agapostemon virescens", 
"Andrena rudbeckiae", "Andrena simplex", "Anthophora terminalis")

inhs.eff1 <-
c("Agapostemon virescens", "Andrena carlini", "Andrena personata", 
"Andrena rudbeckiae", "Augochlora pura", "Augochlorella aurata")

Now, to have all the names in one vector, use union, not intersect or setdiff. Then, create the data.frame using this result and the original vectors.

rn <- union(col.eff1, inhs.eff1)

dat <- data.frame(col.eff1 = as.integer(rn %in% col.eff1),
                  inhs.eff1 = as.integer(rn %in% inhs.eff1)
)
row.names(dat) <- rn

dat
#                      col.eff1 inhs.eff1
#Agapostemon sericeus         1         0
#Agapostemon texanus          1         0
#Agapostemon virescens        1         1
#Andrena rudbeckiae           1         1
#Andrena simplex              1         0
#Anthophora terminalis        1         0
#Andrena carlini              0         1
#Andrena personata            0         1
#Augochlora pura              0         1
#Augochlorella aurata         0         1

Upvotes: 1

iskandarblue
iskandarblue

Reputation: 7526

It will make things easier if you convert your factors to character to create a string vector.

First convert your list of original species names and your other separate species lists to characters

originalSpecies <- c("dog", "cat", "mouse", "monkey", "bird")
as.character(as.list(originalSpecies))

listA <- c("dog", "cat", "orangutan")
listB <- c("monkey", "rat", "hippopotamus")

Then use ifelse() to code the species with a 1 or a 0 and %in% to see if names in your smaller lists are in your original list

> ifelse(listA %in% originalSpecies, 1, 0)
[1] 1 1 0
> ifelse(listB %in% originalSpecies, 1, 0)
[1] 1 0 0

Upvotes: 1

Related Questions