Reputation: 471
I'm trying to use mutate in dplyr to process strings and I'm not getting the output that I want (see below) where instead of operating line by line, mutate is taking the first element and populating it downward. I was wondering if someone could help me understand what I'm doing wrong and how to tweak this code to work properly.
short.idfun = function(longid)
{
x = strsplit(longid,"_")
y = x[[1]]
study = substr(y[1],8,nchar(y[1]))
subj = y[length(y)]
subj = substr(subj,regexpr("[^0]",subj),nchar(subj)) #remove leading zeros
shortid= paste(study,subj,sep="-")
return(shortid)
}
data = data.frame(test=c("1234567Andy_003_003003","1234567Beth_004_003004","1234567Char_003_003005"),stringsAsFactors=FALSE)
data= mutate(data,shortid=short.idfun(test))
print(data)
#### Below is my output
# test shortid
#1 1234567Andy_003_003003 Andy-3003
#2 1234567Beth_004_003004 Andy-3003
#3 1234567Char_003_003005 Andy-3003
#### This is the behavior I was hoping for
# test shortid
#1 1234567Andy_003_003003 Andy-3003
#2 1234567Beth_004_003004 Beth-3004
#3 1234567Char_003_003005 Char-3005
Upvotes: 3
Views: 270
Reputation: 21621
Another alternative is the use of rowwise()
:
data %>%
rowwise() %>%
mutate(shortid = short.idfun(test))
Which gives:
#Source: local data frame [3 x 2]
#Groups: <by row>
#
# test shortid
# (chr) (chr)
#1 1234567Andy_003_003003 Andy-3003
#2 1234567Beth_004_003004 Beth-3004
#3 1234567Char_003_003005 Char-3005
Upvotes: 1
Reputation: 17369
The issue is that your function needs a little help vectorizing. You can run it through vapply
to get what you're after.
data = data.frame(test=c("1234567Andy_003_003003","1234567Beth_004_003004","1234567Char_003_003005"),stringsAsFactors=FALSE)
data= mutate(data,
shortid=vapply(test, short.idfun, character(1)))
print(data)
To see why you got the result you did, we can look at little at the first few lines of your function.
longid = data$test
(x <- strsplit(longid, "_"))
[[1]]
[1] "1234567Andy" "003" "003003"
[[2]]
[1] "1234567Beth" "004" "003004"
[[3]]
[1] "1234567Char" "003" "003005"
Everything looks good so far, but now you define y
.
(y = x[[1]])
[1] "1234567Andy" "003" "003003"
By calling x[[1]]
, you pulled out only the first element of x
, not the first vector in x
, not the first element of each vector in x
. You could also revise your function by defining y <= vapply(x, function(v) v[1], character(1))
and skip the vapply
in mutate
. Either way should work.
Upvotes: 0