user12310746
user12310746

Reputation: 279

Writing a function that reproduces the same output as the "separate" dplyr function in R

I am doing an exercise to practice writing functions. The problem is asking for my function to reproduce the same output as the separate function in dplyr.

I have the following data frame:

df <- data.frame(dates = c("2005-06-29", "2005-07-16", "2005-12-01"), 
                  values = c("F:62:130", "F:68:149", "M:68:160"),
                  stringsAsFactors = FALSE)

I want to separate the "values" column into three separate columns (split at the colon) and drop the "values" column in the final data frame to look like:

dates        gender  ht   wt
1 2005-06-29      F  62  130
2 2005-07-16      F  68  149
3 2005-12-01      M  68  160

The problem I'm running into is naming the new columns in my function. This is what I have so far:


  into <- c() 
  names(into) <- c(a = "", b = "", c = "") 

But when I run my new function, I'm getting an error that I'm attempting to set an attribute on NULL.

When I remove the into/names(into) stuff, I get the following (with the wrong new column names):

dates            values   a  b   c
1 2005-06-29   F:62:130   F 62  130
2 2005-07-16   F:68:149   F 68  149
3 2005-12-01   M:68:160   F 68  160

How do I create an into argument in the function that lets me name the columns whatever I want?

Upvotes: 1

Views: 52

Answers (2)

akrun
akrun

Reputation: 887213

In base R, we can use substring

transform(df, a = substr(values, 1, 1), 
         b = substring(values, 3, 4),
          wt = substring(values, 6))

Or another easier option is to read with read.table specifying the sep as : and create columns on the original dataset either by assignment or cbinding

df[c('a', 'b', 'wt')] <- read.table(text = df$values,  sep=":",  header = FALSE)
df
#       dates   values a  b  wt
#1 2005-06-29 F:62:130 F 62 130
#2 2005-07-16 F:68:149 F 68 149
#3 2005-12-01 M:68:160 M 68 160

The OP's function can be changed to

myfunc <- function(df, colnum = 2, into = c("a", "b", "c"), sep = ":") {

  # Use "colnum" to access the specified column of "df"
  j1 <- colnum
  colnum <- df[ , colnum]

  # Split "df" using the specified separator 
  storage <- strsplit(colnum, split = sep)


  # Take/second/third elements and store it into the above vectors
  a <- sapply(storage, function(x) x[1])
  b <- sapply(storage, function(x) x[2])
  c <- sapply(storage, function(x) x[3])

  out <- cbind(df, setNames(list(a, b, c), into))
  out[setdiff(names(out), names(df)[j1])]

}

myfunc(df)
#.       dates a  b   c
#1 2005-06-29 F 62 130
#2 2005-07-16 F 68 149
#3 2005-12-01 M 68 160



myfunc(df, into = c('a1', 'b1', 'c1'))
#      dates a1 b1  c1
#1 2005-06-29  F 62 130
#2 2005-07-16  F 68 149
#3 2005-12-01  M 68 160

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101733

Here is a base R solution

dfout <- cbind(df,`colnames<-`(do.call(rbind,strsplit(df$values,":")),c("a","b","wt")))

which gives

> dfout
       dates   values a  b  wt
1 2005-06-29 F:62:130 F 62 130
2 2005-07-16 F:68:149 F 68 149
3 2005-12-01 M:68:160 M 68 160

Upvotes: 0

Related Questions