Reputation: 279
I am doing an exercise to practice writing functions. The problem is asking for my function to reproduce the same output as the separate
function in dplyr
.
I have the following data frame:
df <- data.frame(dates = c("2005-06-29", "2005-07-16", "2005-12-01"),
values = c("F:62:130", "F:68:149", "M:68:160"),
stringsAsFactors = FALSE)
I want to separate the "values" column into three separate columns (split at the colon) and drop the "values" column in the final data frame to look like:
dates gender ht wt
1 2005-06-29 F 62 130
2 2005-07-16 F 68 149
3 2005-12-01 M 68 160
The problem I'm running into is naming the new columns in my function. This is what I have so far:
into <- c()
names(into) <- c(a = "", b = "", c = "")
But when I run my new function, I'm getting an error that I'm attempting to set an attribute on NULL.
When I remove the into/names(into) stuff, I get the following (with the wrong new column names):
dates values a b c
1 2005-06-29 F:62:130 F 62 130
2 2005-07-16 F:68:149 F 68 149
3 2005-12-01 M:68:160 F 68 160
How do I create an into
argument in the function that lets me name the columns whatever I want?
Upvotes: 1
Views: 52
Reputation: 887213
In base R
, we can use substring
transform(df, a = substr(values, 1, 1),
b = substring(values, 3, 4),
wt = substring(values, 6))
Or another easier option is to read with read.table
specifying the sep
as :
and create columns on the original dataset either by assignment or cbind
ing
df[c('a', 'b', 'wt')] <- read.table(text = df$values, sep=":", header = FALSE)
df
# dates values a b wt
#1 2005-06-29 F:62:130 F 62 130
#2 2005-07-16 F:68:149 F 68 149
#3 2005-12-01 M:68:160 M 68 160
The OP's function can be changed to
myfunc <- function(df, colnum = 2, into = c("a", "b", "c"), sep = ":") {
# Use "colnum" to access the specified column of "df"
j1 <- colnum
colnum <- df[ , colnum]
# Split "df" using the specified separator
storage <- strsplit(colnum, split = sep)
# Take/second/third elements and store it into the above vectors
a <- sapply(storage, function(x) x[1])
b <- sapply(storage, function(x) x[2])
c <- sapply(storage, function(x) x[3])
out <- cbind(df, setNames(list(a, b, c), into))
out[setdiff(names(out), names(df)[j1])]
}
myfunc(df)
#. dates a b c
#1 2005-06-29 F 62 130
#2 2005-07-16 F 68 149
#3 2005-12-01 M 68 160
myfunc(df, into = c('a1', 'b1', 'c1'))
# dates a1 b1 c1
#1 2005-06-29 F 62 130
#2 2005-07-16 F 68 149
#3 2005-12-01 M 68 160
Upvotes: 1
Reputation: 101733
Here is a base R solution
dfout <- cbind(df,`colnames<-`(do.call(rbind,strsplit(df$values,":")),c("a","b","wt")))
which gives
> dfout
dates values a b wt
1 2005-06-29 F:62:130 F 62 130
2 2005-07-16 F:68:149 F 68 149
3 2005-12-01 M:68:160 M 68 160
Upvotes: 0