Reputation: 311
I am trying to write a function in R which lumps species columns together within a data.frame.
(To elaborate a bit on what I'm doing...I have a data frame with multiple plant species for multiple sites and years. Some of the species were misidentified, so I'd like to group to a more general level (e.g. spp a and spp b were mixed up throughout the years; so I'd like to create a new column called spp.ab in which the data for spp a and b are lumped together)).
Example:
spp.a spp.b
1 0
2 3
0 4
3 2
4 5
I'd like to eventually end up with a single column that displays the maximum from value from the two species:
spp.ab
1
3
4
3
5
I've started writing a function which does this; however, I'm having troubling adding the new column to my data set and dropping the old ones. Could someone tell me what's wrong with my code?
lump <- function(db, spp.list, new.spp) { #input spp.list as c('spp.a', 'spp.b', ...)
mini.db <- subset(db, select=spp.list);
newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
db$new.spp <- newcol
db <- db[,names(db) %in% spp.list]
return(db)
}
When I call the function as such
test <- lump(db, c('spp.a', 'spp.b'), spp.ab)
test
all that pops up is the mini.db. Am I missing something with return()?
For reference, db is the database, spp.list is the species I want to lump together, and new.spp is what I would like the new column named.
Thanks for any help,
Paul
Upvotes: 3
Views: 7308
Reputation: 193687
While it seems like you've found your answer, I would suggest, instead, the pmax
function:
> with(db, pmax(spp.a, spp.b))
[1] 1 3 4 3 5
You can use this with within
or transform
to mimic your function:
out <- within(db, spp.ab <- pmax(spp.a, spp.b))
out
# spp.a spp.b spp.ab
# 1 1 0 1
# 2 2 3 3
# 3 0 4 4
# 4 3 2 3
# 5 4 5 5
Upvotes: 2
Reputation: 311
I've figured it out...stupid mistake, of course. Here is the code that works:
lump <- function(db, spp.list, new.spp) { #input spp.list as a c('spp.a', 'spp.b', ...), and new.spp must be in quotes (e.g. 'new.spp')
mini.db <- subset(db, select=spp.list);
newcol <- as.vector(apply(mini.db, 1, max, na.rm=T));
newcol[newcol==-Inf] <- NA;
db[new.spp] <- newcol;
db <- db[, !names(db) %in% spp.list];
return(as.data.frame(db));
}
The key is in the db[new.spp] <- newcol;
line. Apparently using this works, but using db$new.spp <- newcol
does not. I also then added a !
to the line db <- db[,!names(db) %in% spp.list]
. This was my biggest mistake.
Upvotes: 3