Reputation: 301
I'm trying to create a new column in my tibble which collects and formats all words found in all other columns. I would like to do this using dplyr, if possible. Original DataFrame:
df <- read.table(text = " columnA columnB
1 A Z
2 B Y
3 C X
4 D W
5 E V
6 F U " )
As a simplified example, I am hoping to do something like:
df %>%
rowwise() %>%
mutate(newColumn = myFunc(.))
And have the output look like this:
columnA columnB newColumn
1 A Z AZ
2 B Y BY
3 C X CX
4 D W DW
5 E V EV
6 F U FU
When I try this in my code, the output looks like:
columnA columnB newColumn
1 A Z ABCDEF
2 B Y ABCDEF
3 C X ABCDEF
4 D W ABCDEF
5 E V ABCDEF
6 F U ABCDEF
myFunc should take one row as an argument but when I try using rowwise() I seem to be passing the entire tibble into the function (I can see this from adding a print function into myFunc).
How can I pass just one row and do this iteratively so that it applies the function to every row? Can this be done with dplyr?
Edit:
myFunc in the example is simplified for the sake of my question. The actual function looks like this:
get_chr_vector <- function(row) {
row <- row[,2:ncol(row)] # I need to skip the first row
words <- str_c(row, collapse = ' ')
words <- str_to_upper(words)
words <- unlist(str_split(words, ' '))
words <- words[words != '']
words <- words[!nchar(words) <= 2]
words <- removeWords(words, stopwords_list) # from the tm library
words <- paste(words, sep = ' ', collapse = ' ')
}
Upvotes: 2
Views: 4711
Reputation: 13691
Take a look at ?dplyr::do
and ?purrr::map
, which allow you to apply arbitrary functions to arbitrary columns and to chain the results through multiple unary operators. For example,
df1 <- df %>% rowwise %>% do( X = as_data_frame(.) ) %>% ungroup
# # A tibble: 6 x 1
# X
# * <list>
# 1 <tibble [1 x 2]>
# 2 <tibble [1 x 2]>
# ...
Notice that column X
now contains 1x2 data.frame
s (or tibble
s) comprised of rows from your original data.frame
. You can now pass each one to your custom myFunc
using map
.
myFunc <- function(Y) {paste0( Y$columnA, Y$columnB )}
df1 %>% mutate( Result = map(X, myFunc) )
# # A tibble: 6 x 2
# X Result
# <list> <list>
# 1 <tibble [1 x 2]> <chr [1]>
# 2 <tibble [1 x 2]> <chr [1]>
# ...
Result
column now contains the output of myFunc
applied to each row in your original data.frame
, as desired. You can retrieve the values by concatenating a tidyr::unnest
operation.
df1 %>% mutate( Result = map(X, myFunc) ) %>% unnest
# # A tibble: 6 x 3
# Result columnA columnB
# <chr> <fctr> <fctr>
# 1 AZ A Z
# 2 BY B Y
# 3 CX C X
# ...
If desired, unnest
can be limited to specific columns, e.g., unnest(Result)
.
EDIT: Because your original data.frame
contains only two columns, you can actually skip the do
step and use purrr::map2
instead. The syntax is very similar to map
:
myFunc <- function( a, b ) {paste0(a,b)}
df %>% mutate( Result = map2( columnA, columnB, myFunc ) )
Note that myFunc
is now defined as a binary function.
Upvotes: 6
Reputation: 1869
This should work
df <- read.table(text = " columnA columnB
1 A Z
2 B Y
3 C X
4 D W
5 E V
6 F U " )
df %>%
mutate(mutate_Func = paste0(columnA,columnB))
columnA columnB mutate_Func
1 A Z AZ
2 B Y BY
3 C X CX
4 D W DW
5 E V EV
6 F U FU
Upvotes: 0