jimiclapton
jimiclapton

Reputation: 889

Error using lapply to pass dataframe variable through custom function

I have a function that was suggested by a user as an aswer to my previous question:

word_string <- function(x) {
  inds <- seq_len(nchar(x))
  start = inds[-length(inds)]
  stop = inds[-1]
  substring(x, start, stop)
}

The function works as expected and breaks down a given word into component parts as per my sepcifications:

 word_string('microwave')
[1] "mi" "ic" "cr" "ro" "ow" "wa" "av" "ve"

What I now want to be able to do is have the function applied to all rows of a specified columnin a dataframe.

Here's dataframe for purposes of illustration:

word <- c("House", "Motorcar", "Boat", "Dog", "Tree", "Drink")
some_value <- c("2","100","16","999", "65","1000000")
my_df <- data.frame(word, some_value, stringsAsFactors = FALSE ) 
my_df
      word some_value
1    House          2
2 Motorcar        100
3     Boat         16
4      Dog        999
5     Tree         65
6    Drink    1000000

Now, if I use lapply to work the function on my dataframe, not only do I get incorrect results but also an error message.

 lapply(my_df['word'], word_string)
$word
[1] "Ho" "ot" "at" ""   "Tr" "ri"

Warning message:
In seq_len(nchar(x)) : first element used of 'length.out' argument

So you can see that the function is being applied, but it's being applied such that it's evaluating each row partially. The desired output would be something like:

[1] "ho" "ou" "us" "se
[2] "mo" "ot" "to" "or" "rc" "ca" "ar"
[3] "bo" "oa" "at"
[4] "do" "og"
[5] "tr" "re" "ee" 
[6] "dr" "ri" "in" "nk"

Any guidance greatly appreciated.

Upvotes: 1

Views: 64

Answers (1)

akrun
akrun

Reputation: 887901

The reason is that [ is still a data.frame with one column (if we don't use ,) and so here the unit is a single column.

str(my_df['word'])
'data.frame':   6 obs. of  1 variable:
# $ word: chr  "House" "Motorcar" "Boat" "Dog" ...

The lapply loops over that single column instead of each of the elements in that column.

W need either $ or [[

lapply(my_df[['word']], word_string)
#[[1]]
#[1] "Ho" "ou" "us" "se"

#[[2]]
#[1] "Mo" "ot" "to" "or" "rc" "ca" "ar"

#[[3]]
#[1] "Bo" "oa" "at"

#[[4]]
#[1] "Do" "og"

#[[5]]
#[1] "Tr" "re" "ee"

#[[6]]
#[1] "Dr" "ri" "in" "nk"

Upvotes: 2

Related Questions