Reputation: 1120
Probably the title is not ideal so you are welcome to change it.
Using the list.files
function I stored a character of the file names. Now I would like to create two vectors of numbers using the numbers which can be found in the filenames.
> filenames
[1] "Fr. 10to14 - data.csv" "Fr. 13to17 - data.csv" "Fr. 16to20 - data.csv" "Fr. 19to24 - data.csv"
[5] "Fr. 1to5 - data.csv" "Fr. 4to8 -data.csv" "Fr. 7to11 - data.csv"
So for now I do it manually:
vec_fr_start <- c(10,13,16,19,1,4,7)
vec_fr_end <- c(14,17,20,24,5,8,11)
So I store in the vectors all "starting" numbers and all "ending" numbers. Do you know any efficient way to fish those numbers from such strings ?
The problem is that files can be named different but the anchor (numberTOnumber) will be always the same. In that case would be the best to use to
string and take the number of the left and one on the right.
Upvotes: 0
Views: 79
Reputation: 887831
We can use str_extract_all
library(stringr)
lst <- lapply(str_extract_all(filenames, '\\d+'), as.numeric)
do.call(rbind, lst)
# [,1] [,2]
#[1,] 10 14
#[2,] 13 17
#[3,] 16 20
#[4,] 19 24
The first column would be 'start' and the second 'end'
If we want it to be more specific i.e. even if there are more numbers in the string, it will extract only those numbers before and after 'to'. For this, we can use regex lookarounds.
lst <- lapply(str_extract_all(filenames,
'\\d+(?=to)|(?<=to)\\d+'), as.numeric)
and then process as before.
Or a base R
option would be to use sub
, capture the numeric strings as a group, create a separator in the replacement
and then read with read.csv/read.table
read.csv(text=sub('\\D+(\\d+)to(\\d+).*',
'\\1,\\2', filenames), header=FALSE)
# V1 V2
#1 10 14
#2 13 17
#3 16 20
#4 19 24
filenames <- c("Fr. 10to14 - data.csv", "Fr. 13to17 - data.csv",
"Fr. 16to20 - data.csv", "Fr. 19to24 - data.csv")
Upvotes: 3
Reputation: 70336
Using base R:
> sub("\\D*(\\d+)to.*", "\\1", x) # start values
#[1] "10" "13" "16" "19"
> sub(".*to(\\d+).*", "\\1", x) # end values
#[1] "14" "17" "20" "24"
Sample input:
x <- c("Fr. 10to14 - data.csv", "Fr. 13to17 - data.csv", "Fr. 16to20 - data.csv", "Fr. 19to24 - data.csv")
This approach relies on extracting the numbers before and after the to
in the string as specified in the question:
The problem is that files can be named different but the anchor (numberTOnumber) will be always the same. In that case would be the best to use to string and take the number of the left and one on the right.
It will work even if other numbers were present in the string.
Upvotes: 4