Reputation: 21274
Given this example data:
require(stringr)
require(tidyverse)
labels <- c("foo", "bar", "baz")
n_rows <- 4
df <- 1:n_rows %>%
map(~ data.frame(
block_order=paste(sample(labels, size=length(labels), replace=FALSE),
collapse="|"))) %>%
bind_rows()
df
block_order
1 foo|bar|baz
2 baz|bar|foo
3 foo|baz|bar
4 foo|bar|baz
I want to generate a column for each string in labels
, which takes the value of the position of that string in the |
-separated sequence in each row.
Desired output:
block_order foo bar baz
1 foo|bar|baz 1 2 3
2 baz|bar|foo 3 2 1
3 foo|baz|bar 1 3 2
4 foo|bar|baz 1 2 3
I've been trying different variations in a dplyr
/purrr
setup, like this example, where I map
in each value of label
, and then attempt to get its position in block_order
using match
on str_split
:
labels %>%
map(~ df %>%
transmute(!!.x := match(!!.x, str_split(block_order,
"\\|",
simplify=TRUE)))) %>%
bind_cols(df, .)
But that produces unexpected output:
block_order foo bar baz
1 foo|bar|baz 1 5 2
2 baz|bar|foo 1 5 2
3 foo|baz|bar 1 5 2
4 foo|bar|baz 1 5 2
I'm not really sure what these numbers represent, or why they're all the same.
If anyone can help me figure out (a) how to achieve my desired output in a dplyr
/purrr
framework and (b) why the proposed solution here gives the output it does, I'd be very appreciative.
Upvotes: 1
Views: 186
Reputation: 43354
Unless you need to for other reasons, you don't have to fully split the string if you just identify the location of the first match for each value of labels
, which regexpr
will give you. map
ping over labels
will give a list with one element for each string in labels
(so it's a quick iteration), which you can then pmap
rank
over to get indices. Using the *_dfr
version to simplify the results to a data frame and cbinding to the original,
library(tidyverse)
set.seed(47)
labels <- c("foo", "bar", "baz")
df <- data_frame(block_order = replicate(10, paste(sample(labels), collapse = "|")))
labels %>%
map(~regexpr(.x, df$block_order)) %>%
pmap_dfr(~set_names(as.list(rank(c(...))), labels)) %>%
bind_cols(df, .)
#> # A tibble: 10 x 4
#> block_order foo bar baz
#> <chr> <dbl> <dbl> <dbl>
#> 1 baz|foo|bar 2. 3. 1.
#> 2 baz|bar|foo 3. 2. 1.
#> 3 bar|foo|baz 2. 1. 3.
#> 4 baz|foo|bar 2. 3. 1.
#> 5 foo|bar|baz 1. 2. 3.
#> 6 baz|foo|bar 2. 3. 1.
#> 7 foo|baz|bar 1. 3. 2.
#> 8 bar|baz|foo 3. 1. 2.
#> 9 baz|foo|bar 2. 3. 1.
#> 10 foo|bar|baz 1. 2. 3.
If you prefer stringr/stringi to base regex, you could to the same thing by changing the regexpr
call to str_locate(df$block_order, .x)[, "start"]
or stringi::stri_locate_first_fixed
in the same arrangement.
Upvotes: 4
Reputation: 887611
We can split the 'block_order' by |
, loop through the list
of vector
s using lapply
, get the index with match
, rbind
the vector
s and assign it to create new columns
labels <- c("foo", "bar", "baz")
df[labels] <- do.call(rbind, lapply(strsplit(df$block_order, "|",
fixed = TRUE), match, table = labels))
Or similar idea with tidyverse
library(tidyverse)
str_split(df$block_order, "[|]") %>%
map(~ .x %>%
match(table= labels)) %>%
do.call(rbind, .) %>%
as_tibble %>%
set_names(labels) %>%
bind_cols(df, .)
# block_order foo bar baz
#1 foo|bar|baz 1 2 3
#2 baz|bar|foo 3 2 1
#3 foo|baz|bar 1 3 2
#4 foo|bar|baz 1 2 3
Another option would be to use separate_rows
, reshape it to 'long' format and spread
it back
rownames_to_column(df, 'rn') %>%
separate_rows(block_order) %>%
group_by(rn) %>%
mutate(ind = match(block_order, labels), labels = factor(labels, levels = labels)) %>%
select(-block_order) %>%
spread(labels, ind) %>%
ungroup %>%
select(-rn) %>%
bind_cols(df, .)
Upvotes: 5
Reputation: 739
I think this might work:
library(tidyr)
library(purrr)
position_counter <- function(...) {
row = list(...)
row %>% map(~which(row == .)) %>% setNames(row)
}
df %>%
separate(block_order, labels) %>%
pmap_df(position_counter)
Upvotes: 1