Reputation: 1233
I have data where some of the items are numbers separated by "|", like:
head(mintimes)
[1] "3121|3151" "1171" "1351|1381" "1050" "" "122"
head(minvalues)
[1] 14 10 11 31 Inf 22
What I would like to do is extract all the times and match them to the minvalues. To end up with something like:
times values
3121 14
3151 14
1171 10
1351 11
1381 11
1050 31
122 22
I've tried to strsplit(mintimes, "|")
and I've tried str_extract(mintimes, "[0-9]+")
but they don't seem to work. Any ideas?
Upvotes: 6
Views: 188
Reputation: 109874
Here's a qdap
package approach:
mintimes <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
minvalues <- c(14, 10, 11, 31, Inf, 22)
library(qdap)
list2df(setNames(strsplit(mintimes, "\\|"), minvalues), "times", "values")
## times values
## 1 3121 14
## 2 3151 14
## 3 1171 10
## 4 1351 11
## 5 1381 11
## 6 1050 31
## 7 122 22
Upvotes: 1
Reputation: 13056
And here's another solution using stri_split_fixed
from the stringi
package. As an added value, we also play with mapply
and do.call
.
Input data:
mintimes <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
minvalues <- c(14, 10, 11, 31, Inf, 22)
Split mintimes
w.r.t. |
and convert to numeric:
library("stringi")
mintimes <- lapply(stri_split_fixed(mintimes, "|"), as.numeric)
## [[1]]
## [1] 3121 3151
##
## [[2]]
## [1] 1171
##
## [[3]]
## [1] 1351 1381
##
## [[4]]
## [1] 1050
##
## [[5]]
## [1] NA
##
## [[6]]
## [1] 122
Column-bind each minvalues
with corresponding mintimes
:
tmp <- mapply(cbind, mintimes, minvalues)
## [[1]]
## [,1] [,2]
## [1,] 3121 14
## [2,] 3151 14
##
## [[2]]
## [,1] [,2]
## [1,] 1171 10
##
## [[3]]
## [,1] [,2]
## [1,] 1351 11
## [2,] 1381 11
##
## [[4]]
## [,1] [,2]
## [1,] 1050 31
##
## [[5]]
## [,1] [,2]
## [1,] NA Inf
##
## [[6]]
## [,1] [,2]
## [1,] 122 22
Row-bind all the 6 matrices & remove NA
-rows:
res <- do.call(rbind, tmp)
res[!is.na(res[,1]),]
## [,1] [,2]
## [1,] 3121 14
## [2,] 3151 14
## [3,] 1171 10
## [4,] 1351 11
## [5,] 1381 11
## [6,] 1050 31
## [7,] 122 22
Upvotes: 2
Reputation: 99341
|
is a regular expression metacharacter. When used literally, these special characters need to be escaped either with []
or with \\
(or you could use fixed = TRUE
in some functions). So your call to strsplit()
should be
strsplit(mintimes, "[|]")
or
strsplit(mintimes, "\\|")
or
strsplit(mintimes, "|", fixed = TRUE)
Regarding your other try with stringr
functions, str_extract_all()
seems to do the trick.
library(stringr)
str_extract_all(mintimes, "[0-9]+")
To get your desired result,
> mintimes <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
> minvalues <- c(14, 10, 11, 31, Inf, 22)
> s <- strsplit(mintimes, "[|]")
> data.frame(times = as.numeric(unlist(s)),
values = rep(minvalues, sapply(s, length)))
# times values
# 1 3121 14
# 2 3151 14
# 3 1171 10
# 4 1351 11
# 5 1381 11
# 6 1050 31
# 7 122 22
Upvotes: 6
Reputation: 193547
I have written a function called cSplit
that is useful for these types of things. You can get it from my Gist: https://gist.github.com/mrdwab/11380733
Usage would be:
cSplit(data.table(mintimes, minvalues), "mintimes", "|", "long")
# mintimes minvalues
# 1: 3121 14
# 2: 3151 14
# 3: 1171 10
# 4: 1351 11
# 5: 1381 11
# 6: 1050 31
# 7: 122 22
It also has a "wide" setting, in case that would be at all useful to you:
cSplit(data.table(mintimes, minvalues), "mintimes", "|", "wide")
# minvalues mintimes_1 mintimes_2
# 1: 14 3121 3151
# 2: 10 1171 NA
# 3: 11 1351 1381
# 4: 31 1050 NA
# 5: Inf NA NA
# 6: 22 122 NA
Note: The output is a data.table
.
Upvotes: 3
Reputation: 9913
To get the output you want, try something like this:
library(dplyr)
Split.Times <- function(x) {
mintimes <- as.numeric(unlist(strsplit(as.character(x$mintimes), "\\|")))
return(data.frame(mintimes = mintimes, minvalues = x$minvalues, stringsAsFactors=FALSE))
}
df <- data.frame(mintimes, minvalues, stringsAsFactors=FALSE)
df %>%
filter(mintimes != "") %>%
group_by(mintimes) %>%
do(Split.Times(.))
This produces:
mintimes minvalues
1 1050 31
2 1171 10
3 122 22
4 1351 11
5 1381 11
6 3121 14
7 3151 14
(I borrowed from my answer here - which is pretty much the same question/problem)
Upvotes: 1
Reputation: 27388
As others have mentioned, you need to escape the |
to include it literally in a regular expression. As always, we can skin this cat many ways, and here's one way to do it with stringr
:
x <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
library(stringr)
unlist(str_extract_all(x, "\\d+"))
# [1] "3121" "3151" "1171" "1351" "1381" "1050" "122"
This won't work as expected if you have any decimal points in a character string of numbers, so the following (which says to match anything but |
) might be safer:
unlist(str_extract_all(x, '[^|]+'))
# [1] "3121" "3151" "1171" "1351" "1381" "1050" "122"
Either way, you might want to wrap the result in as.numeric
.
Upvotes: 2
Reputation: 206308
By default strsplit
splits using a regular expression and "|" is a special character in the regular expression syntax. You can either escape it
strsplit(mintimes,"\\|")
or just set fixed=T to not use regular expressions
strsplit(mintimes,"|", fixed=T)
Upvotes: 4