Nastya
Nastya

Reputation: 11

How to extract information from text in r and create new column?

There is a column in my data with information about a product:

"Technical Details Manufacturer recommended age:14 years and up Manufacturer reference176-1308 Scale1::160 Track Width/GaugeNo Additional Information ....".

How do I extract only age from that text and put it in a separate column?

The expected output would be the number after "age:", 14.

Probably I need to use package stringr and try function str_extract but it not clear how to realise that.

Upvotes: 1

Views: 40

Answers (2)

DJack
DJack

Reputation: 4940

An alternative solution:

s <- "Technical Details Manufacturer recommended age:14 years and up Manufacturer reference176-1308 Scale1::160 Track Width/GaugeNo Additional Information ...."

sub(".*age:(\\d+).*", "\\1", s)
#[1] "14"
  • (\\d+): Capture one or more digits

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Maybe there are simpler regular expressions but this one seems to work.

s <- "Technical Details Manufacturer recommended age:14 years and up Manufacturer reference176-1308 Scale1::160 Track Width/GaugeNo Additional Information ...."
s

sub(".*age[^[:digit:]]*([[:digit:]]*).*", "\\1", s)
#[1] "14"

If you want the output as a number,

num <- sub(".*age[^[:digit:]]*([[:digit:]]*).*", "\\1", s)
num <- as.integer(num)
num
#[1] 14

You can do the above in one step, num <- as.integer(sub(etc)).

Upvotes: 0

Related Questions