fastlanes
fastlanes

Reputation: 333

Best way to apply a custom function to existing column to create a new column in data frame in R

I have a data frame with a character-type column containing strings of numbers in a comma-delimited manner i.e. 1, 2, 3, 4 . I have a custom function that I would like to apply to each value row-wise in the column in order to get a new value that I can store into a new column to the data frame df.

Initial data frame

A B str
1 1 1, 2, 5
1 2 NA
2 1 NA
2 2 1, 3

Final data frame

A B str      res
1 1 1, 2, 5  2
1 2 NA       0
2 1 NA       0
2 2 1, 3     1

This is my custom function getCounts

getCounts <- function(str, x, y){
  if (is.na(str)){
    return(as.integer(0))
  }
  vec <- as.integer(unlist(strsplit(str, ',')))
  count <- 0
  for (i in vec) {
    if (i >= x & i <= y){
      count <- count + 1
    }
  }
  return(as.integer(count))
}


I originally tried using lapply as it seemed like it was best suited based on other posts but kept getting an error such as:

df <- df %>% mutate(res = lapply(df$str, getCounts(df$str, 0, 2)))
Error: Problem with `mutate()` input `res`. x missing value where TRUE/FALSE needed i Input `res` is `lapply(df$str, getCounts(df$str, 0, 2))`

The only thing that seems to be working is when I use mapply, but I don't really understand why and if there is a better way to do this.

df <- df %>%mutate(res = mapply(getCounts, df$str, 0, 2))

Upvotes: 0

Views: 1238

Answers (2)

Matt Kaye
Matt Kaye

Reputation: 530

If I'm reading this right, you should be able to just use rowwise():

df %>%
  rowwise() %>%
  mutate(res = getCounts(str, 0, 2)) %>%
  ungroup()

with your data:

data.frame(
    A = c(1,1,2,2),
    B = c(1,2,1,2),
    str = c('1, 2, 5', NA, NA, '1, 3')
) -> df

getCounts <- function(str, x, y){
    if (is.na(str)){
        return(as.integer(0))
    }
    vec <- as.integer(unlist(strsplit(str, ',')))
    count <- 0
    for (i in vec) {
        if (i >= x & i <= y){
            count <- count + 1
        }
    }
    return(as.integer(count))
}

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>%
    rowwise() %>%
    mutate(res = getCounts(str, 0, 2)) %>%
    ungroup()
#> # A tibble: 4 x 4
#>       A     B str       res
#>   <dbl> <dbl> <chr>   <int>
#> 1     1     1 1, 2, 5     2
#> 2     1     2 <NA>        0
#> 3     2     1 <NA>        0
#> 4     2     2 1, 3        1

Created on 2021-03-17 by the reprex package (v1.0.0)

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102349

You can try Vectorize

df %>%
  mutate(res = Vectorize(getCounts)(str, 0, 2))

or sapply

df %>%
  mutate(res = sapply(str, getCounts, x = 0, y = 2))

Upvotes: 1

Related Questions