Reputation: 832
I have ugly data that looks like this:
source_data <- data.frame(thing = c('C', 'E', 'G'), ugly_sequence_string = c('A,B,C', 'D,E,F', 'G,H,I'))
I would like to add a column with the integer position of thing in ugly_sequence_string:
target_data <- data.frame(thing = c('C', 'E', 'G'), position = c(3L, 2L, 1L))
I feel like this has to be possible with some combination of strsplit (or stringr::str_split), dplyr::mutate, which, and maybe purrr::map, but I'm failing to wrap my mind around some aspect of how to do it. For example, this definitely doesn't work:
source_data %>%
dplyr::mutate(
position = which(stringr::str_split(ugly_sequence_string, ',') == thing)
)
I've tried breaking that off into a function (with various combinations of unlist() and as.list() to get it into a format for which to be happy with), but it seems like this might be an easy thing that I'm just not grokking. Suggestions?
Upvotes: 2
Views: 63
Reputation: 79238
transform(d,here=mapply(function(x,y)regexpr(x,gsub(",","",y))[[1]],d$thing,d$ugl))
thing ugly_sequence_string here
C C A,B,C 3
E E D,E,F 2
G G G,H,I 1
or even:
here=mapply(function(x,y)match(x,strsplit(y,",")[[1]]),d[,1],d[,2])
Upvotes: 0
Reputation: 20095
One way could be using base r
and stringr
and mapply
as:
source_data <- data.frame(thing = c('C', 'E', 'G'),
ugly_sequence_string = c('A,B,C', 'D,E,F', 'G,H,I'))
library(stringr)
#Function to perform search
find_thing <- function(x, y){
which(stringr::str_split(x, ',') [[1]] == y)
}
source_data$position <- mapply(find_thing,
source_data$ugly_sequence_string, source_data$thing)
Result:
> source_data
thing ugly_sequence_string position
1 C A,B,C 3
2 E D,E,F 2
3 G G,H,I 1
Upvotes: 2
Reputation: 25385
Here is one option:
source_data$index <- sapply(1:nrow(source_data), function(x) {which(
strsplit(source_data$ugly_sequence_string[x],',')[[1]]==source_data$thing[x])})
Output:
thing ugly_sequence_string index
1 C A,B,C 3
2 E D,E,F 2
3 G G,H,I 1
Hope this helps!
Upvotes: 2