Reputation: 17
I have files with names like
I'd like to create a data frame where each row is information extracted from a file name in the form of Author, Volume, Issue.
I'm able to extract the name and volume, but can't seem to get the issue number. Using "stringr" package, I've done the following, which gives me _4
instead of just 4
.
[^a-z](?:[^_]+_){0}([^_ ]+$)
How do I fix this?
Upvotes: 0
Views: 1173
Reputation: 887531
If it is the last digit, we can just use base R
methods to extract it
as.numeric(substring(str1, nchar(str1)))
Or with sub
as.numeric(sub(".*_", "", str1))
#[1] 4 3 6
If we need to split it to individual columns, one option is separate
from tidyverse
which will split
the column into indvidual columns based on the delimiter (_
) and also ensure that the type of column is convert
ed
library(tidyverse)
data_frame(col1 = str1) %>%
separate(col1, into = c("Author", "Volume", "Issue"), sep = "_", convert = TRUE)
# A tibble: 3 x 3
# Author Volume Issue
# <chr> <chr> <chr>
#1 Hughson.George 54 4
#2 Ifran.Dean 51 3
#3 Houston.Amanda 49 6
str1 <- c("Hughson.George_54_4", "Ifran.Dean_51_3", "Houston.Amanda_49_6")
Upvotes: 0
Reputation: 79288
you are looking for:
read.table(text = string, sep ='_', col.names = c('Author', 'Volume', 'Issue'))
Author Volume Issue
1 Hughson.George 54 4
2 Ifran.Dean 51 3
3 Houston.Amanda 49 6
where
string <- c("Hughson.George_54_4", "Ifran.Dean_51_3", "Houston.Amanda_49_6")
edit: You are looking for:
read.table(text = string, sep ='_', fill=TRUE)
Upvotes: 1
Reputation: 66844
The [^a-z]
part of your regex is matching the _
preceding the last digit. Just use something to match only the digits at the end:
x1 <- c("Hughson.George_54_4", "Ifran.Dean_51_3", "Houston.Amanda_49_6")
str_extract(x1,"([^_]+$)")
[1] "4" "3" "6"
str_extract(x1,"\\d+$")
[1] "4" "3" "6"
You overall aim seems like a job for strsplit
though:
data.frame(do.call("rbind",strsplit(sub("\\."," ",x1),"_")))
X1 X2 X3
1 Hughson George 54 4
2 Ifran Dean 51 3
3 Houston Amanda 49 6
Upvotes: 0