Reputation: 6581
I have the dataframe
names <- c("doe.jane", "doe.john", "smith.bob")
number <- c(3, 5, 1)
site <- c("A1", "A1", "A2")
df <- as.data.frame(matrix(c(site, names, number), 3))
names(df) <- c("site", "names", "number")
I need to replace full names with last names only and then collapse the data frame so the output is
names <- c("doe", "smith")
number <- c(8, 1)
site <- c("A1", "A2")
df <- as.data.frame(matrix(c(site, names, number), 2))
names(df) <- c("site", "names", "number")
Upvotes: 2
Views: 222
Reputation: 93938
Here's a version using regex to get the name part. I've recreated the data due to the numbers being saved as factors - thanks to mplourde for pointing that out.
#set up the data
names <- c("doe.jane","doe.john","smith.bob")
number <- c(3,5,1)
site <- c("A1","A1","A2")
df <- data.frame(site,names,number)
#get the first part of the name
df$names <- gsub("([[:alpha:]]+)\\.([[:alpha:]]+)","\\1",df$names)
#aggregate the data by site and name
dfnew <- aggregate(df["number"],df[c("site","names")],sum)
> dfnew
site names number
1 A1 doe 8
2 A2 smith 1
Upvotes: 1
Reputation: 44614
You'd want to do something like this:
last.names <- function(names) {
names <- as.character(names)
split.names <- strsplit(names, split='.', fixed=TRUE)
sapply(split.names, function(x) x[1])
}
df <- within(df, names <- last.names(names))
df <- with(df, aggregate(as.numeric(number), by=list(site=site, names=names), sum))
I'll point out that your definition of df
is a little misguided. You really just need to say df <- data.frame(names, number, site)
. The way your doing it leads to three factor
columns in the resulting data.frame
.
Upvotes: 3