darkpunk
darkpunk

Reputation: 17

Extract number between underscore in text

I have files with names like

I'd like to create a data frame where each row is information extracted from a file name in the form of Author, Volume, Issue.

I'm able to extract the name and volume, but can't seem to get the issue number. Using "stringr" package, I've done the following, which gives me _4 instead of just 4.

[^a-z](?:[^_]+_){0}([^_ ]+$)  

How do I fix this?

Upvotes: 0

Views: 1173

Answers (3)

akrun
akrun

Reputation: 887531

If it is the last digit, we can just use base R methods to extract it

as.numeric(substring(str1, nchar(str1)))

Or with sub

as.numeric(sub(".*_", "", str1))
#[1] 4 3 6

If we need to split it to individual columns, one option is separate from tidyverse which will split the column into indvidual columns based on the delimiter (_) and also ensure that the type of column is converted

library(tidyverse)
data_frame(col1 = str1) %>%
    separate(col1, into = c("Author", "Volume", "Issue"), sep = "_", convert = TRUE)
# A tibble: 3 x 3
#  Author         Volume Issue
#  <chr>          <chr>  <chr>
#1 Hughson.George 54     4    
#2 Ifran.Dean     51     3    
#3 Houston.Amanda 49     6    

data

str1 <- c("Hughson.George_54_4", "Ifran.Dean_51_3", "Houston.Amanda_49_6")

Upvotes: 0

Onyambu
Onyambu

Reputation: 79288

you are looking for:

read.table(text = string, sep ='_', col.names = c('Author', 'Volume', 'Issue'))

          Author Volume Issue
1 Hughson.George     54     4
2     Ifran.Dean     51     3
3 Houston.Amanda     49     6

where

string <- c("Hughson.George_54_4", "Ifran.Dean_51_3", "Houston.Amanda_49_6")

edit: You are looking for:

 read.table(text = string, sep ='_', fill=TRUE)

Upvotes: 1

James
James

Reputation: 66844

The [^a-z] part of your regex is matching the _ preceding the last digit. Just use something to match only the digits at the end:

x1 <- c("Hughson.George_54_4", "Ifran.Dean_51_3", "Houston.Amanda_49_6")

str_extract(x1,"([^_]+$)")
[1] "4" "3" "6"

str_extract(x1,"\\d+$")
[1] "4" "3" "6"

You overall aim seems like a job for strsplit though:

data.frame(do.call("rbind",strsplit(sub("\\."," ",x1),"_")))
              X1 X2 X3
1 Hughson George 54  4
2     Ifran Dean 51  3
3 Houston Amanda 49  6

Upvotes: 0

Related Questions