Reputation: 35
I have a list that looks something like:
list <- c("2 chairs.", "1 chair & 4 books.",
"Sitting on 1 couch. Another 4 chairs & 3 books.",
NA, "1 chair.",
"3 books")
My list is actually 10k+ long, but this abbreviated list captures all the variations. I need to extract the number before chair(s) and the number before book(s) only. I prefer to end up with a list of lists where some lists will include two numbers, some lists will include one number and some lists will have only NA.
I have tried gsub()
and strsplit()
in a variety of ways to obtain the final result that I want with no luck.
Edit: Maybe I should have been more specific in my question above. I need the result to be numeric and not a number as a string. I would also prefer to have the NA values remain as NA. Thanks.
Upvotes: 2
Views: 77
Reputation: 887148
We can use str_extract
str_extract_all(list, "[0-9](?=\\s*(books|chair[s]*))")
#[[1]]
#[1] "2"
#[[2]]
#[1] "1" "4"
#[[3]]
#[1] "4" "3"
#[[4]]
#[1] NA
#[[5]]
#[1] "1"
Upvotes: 2
Reputation: 93813
For multiple matches per string, try:
regmatches(x, gregexpr("\\d+(?= (chair|book))", x, perl=TRUE))
#[[1]]
#[1] "2"
#
#[[2]]
#[1] "1" "4"
#
#[[3]]
#[1] "4" "3"
#
#[[4]]
#character(0)
#
#[[5]]
#[1] "1"
I imagine str_extract
or cousins would do a similar job.
Upvotes: 3