johncarter
johncarter

Reputation: 35

Separating multiple numbers from a list of numbers and letters in R

I have a list that looks something like:

list <- c("2 chairs.", "1 chair & 4 books.", 
         "Sitting on 1 couch. Another 4 chairs & 3 books.", 
         NA, "1 chair.", 
         "3 books")

My list is actually 10k+ long, but this abbreviated list captures all the variations. I need to extract the number before chair(s) and the number before book(s) only. I prefer to end up with a list of lists where some lists will include two numbers, some lists will include one number and some lists will have only NA.

I have tried gsub() and strsplit() in a variety of ways to obtain the final result that I want with no luck.

Edit: Maybe I should have been more specific in my question above. I need the result to be numeric and not a number as a string. I would also prefer to have the NA values remain as NA. Thanks.

Upvotes: 2

Views: 77

Answers (2)

akrun
akrun

Reputation: 887148

We can use str_extract

str_extract_all(list, "[0-9](?=\\s*(books|chair[s]*))")
#[[1]]
#[1] "2"

#[[2]]
#[1] "1" "4"

#[[3]]
#[1] "4" "3"

#[[4]]
#[1] NA

#[[5]]
#[1] "1"

Upvotes: 2

thelatemail
thelatemail

Reputation: 93813

For multiple matches per string, try:

regmatches(x, gregexpr("\\d+(?= (chair|book))", x, perl=TRUE))
#[[1]]
#[1] "2"
#
#[[2]]
#[1] "1" "4"
#
#[[3]]
#[1] "4" "3"
#
#[[4]]
#character(0)
#
#[[5]]
#[1] "1"

I imagine str_extract or cousins would do a similar job.

Upvotes: 3

Related Questions