Reputation: 13
I have a list of over 400 species names. I also have two separate lists of names that are present in that original 400. What I want to do is make a presence/absence (1 and 0) data frame based on the two smaller lists. The two separate lists are as factors with levels equal to their length (unique species names), so each of those two are different lengths, and both smaller than the original 400.
Here is some of the names in two of those four separate lists:
head(col.eff1)
[1] Agapostemon sericeus Agapostemon texanus Agapostemon virescens Andrena rudbeckiae Andrena simplex
[6] Anthophora terminalis
head(inhs.eff1)
[1] Agapostemon virescens Andrena carlini Andrena personata Andrena rudbeckiae Augochlora pura
[6] Augochlorella aurata
I have tried the following to get the names
intersect(col.eff1,inhs.eff1) ## what names do both lists have in common
setdiff(col.eff1,inhs.eff1) ## what names does 1 have that 2 does not
setdiff(inhs.eff1,col.eff1) ## what names does 2 have that 1 does not
These all work once I sort the lists before executing the code, but it only gives me the names.
But I need to make a single data frame, with 400 rows (rownames as species names), and two columns for the presence of species in collections and in sampling (inhs. and col.). **I am doing this whole thing four different times too. Any help is appreciated, thank you!
Upvotes: 0
Views: 686
Reputation: 76402
I would also use %in%
like the_darkside did, but with a different approach.
First, the data.
col.eff1 <-
c("Agapostemon sericeus", "Agapostemon texanus", "Agapostemon virescens",
"Andrena rudbeckiae", "Andrena simplex", "Anthophora terminalis")
inhs.eff1 <-
c("Agapostemon virescens", "Andrena carlini", "Andrena personata",
"Andrena rudbeckiae", "Augochlora pura", "Augochlorella aurata")
Now, to have all the names in one vector, use union
, not intersect
or setdiff
. Then, create the data.frame
using this result and the original vectors.
rn <- union(col.eff1, inhs.eff1)
dat <- data.frame(col.eff1 = as.integer(rn %in% col.eff1),
inhs.eff1 = as.integer(rn %in% inhs.eff1)
)
row.names(dat) <- rn
dat
# col.eff1 inhs.eff1
#Agapostemon sericeus 1 0
#Agapostemon texanus 1 0
#Agapostemon virescens 1 1
#Andrena rudbeckiae 1 1
#Andrena simplex 1 0
#Anthophora terminalis 1 0
#Andrena carlini 0 1
#Andrena personata 0 1
#Augochlora pura 0 1
#Augochlorella aurata 0 1
Upvotes: 1
Reputation: 7526
It will make things easier if you convert your factors to character to create a string vector.
First convert your list of original species names and your other separate species lists to characters
originalSpecies <- c("dog", "cat", "mouse", "monkey", "bird")
as.character(as.list(originalSpecies))
listA <- c("dog", "cat", "orangutan")
listB <- c("monkey", "rat", "hippopotamus")
Then use ifelse()
to code the species with a 1 or a 0 and %in%
to see if names in your smaller lists are in your original list
> ifelse(listA %in% originalSpecies, 1, 0)
[1] 1 1 0
> ifelse(listB %in% originalSpecies, 1, 0)
[1] 1 0 0
Upvotes: 1