Shaxi Liver
Shaxi Liver

Reputation: 1120

Creating a vector of numbers extracted from the strings

Probably the title is not ideal so you are welcome to change it.

Using the list.files function I stored a character of the file names. Now I would like to create two vectors of numbers using the numbers which can be found in the filenames.

> filenames
[1] "Fr. 10to14 - data.csv" "Fr. 13to17 - data.csv" "Fr. 16to20 - data.csv" "Fr. 19to24 - data.csv"
[5] "Fr. 1to5 - data.csv"   "Fr. 4to8 -data.csv"    "Fr. 7to11 - data.csv" 

So for now I do it manually:

vec_fr_start <- c(10,13,16,19,1,4,7) 
vec_fr_end <- c(14,17,20,24,5,8,11)

So I store in the vectors all "starting" numbers and all "ending" numbers. Do you know any efficient way to fish those numbers from such strings ?

The problem is that files can be named different but the anchor (numberTOnumber) will be always the same. In that case would be the best to use to string and take the number of the left and one on the right.

Upvotes: 0

Views: 79

Answers (2)

akrun
akrun

Reputation: 887831

We can use str_extract_all

library(stringr)
lst <- lapply(str_extract_all(filenames, '\\d+'), as.numeric)
do.call(rbind, lst)
#      [,1] [,2]
#[1,]   10   14
#[2,]   13   17
#[3,]   16   20
#[4,]   19   24

The first column would be 'start' and the second 'end'

If we want it to be more specific i.e. even if there are more numbers in the string, it will extract only those numbers before and after 'to'. For this, we can use regex lookarounds.

lst <- lapply(str_extract_all(filenames,
       '\\d+(?=to)|(?<=to)\\d+'), as.numeric)

and then process as before.


Or a base R option would be to use sub, capture the numeric strings as a group, create a separator in the replacement and then read with read.csv/read.table

read.csv(text=sub('\\D+(\\d+)to(\\d+).*', 
             '\\1,\\2', filenames), header=FALSE)
#  V1 V2
#1 10 14
#2 13 17
#3 16 20
#4 19 24

data

filenames <- c("Fr. 10to14 - data.csv", "Fr. 13to17 - data.csv", 
  "Fr. 16to20 - data.csv", "Fr. 19to24 - data.csv")

Upvotes: 3

talat
talat

Reputation: 70336

Using base R:

> sub("\\D*(\\d+)to.*", "\\1", x)  # start values
#[1] "10" "13" "16" "19"

> sub(".*to(\\d+).*", "\\1", x)    # end values
#[1] "14" "17" "20" "24"

Sample input:

x <- c("Fr. 10to14 - data.csv", "Fr. 13to17 - data.csv", "Fr. 16to20 - data.csv", "Fr. 19to24 - data.csv")

This approach relies on extracting the numbers before and after the to in the string as specified in the question:

The problem is that files can be named different but the anchor (numberTOnumber) will be always the same. In that case would be the best to use to string and take the number of the left and one on the right.

It will work even if other numbers were present in the string.

Upvotes: 4

Related Questions