HoKyun
HoKyun

Reputation: 1

gsub error extract url with R, what did i miss

I tried to extract URL but everytime I run my code. It didn't work. What did miss? any help will be great.

x$URL <- gsub("(.*)(http://www.bloomin.com)(.jpg)(.)",
"//2//3", x$Product.Description.)

[1] //2//3

It was what I return. I want to get http://www.blooming.com/image/xxxxxxxx.jpg in return from below vector.

<div>Colorful Floor chair Series</div><div><br /></div><div>Soft
Suede</div><div><br /></div><div>Cute bubble design</div><div><br
/></div><div><p align="center"><p align="center"><img
src="http://gdetail.image-gemkt.com/186/716088198/2010/2/e3b117e2-a7bd-4d.GIF"
/></div><div><p align="center"><p align="center"><img
src="http://www.blooming.com/image/xxxxxxxx.jpg" /></div>

Upvotes: 0

Views: 124

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174696

  1. Backreferences must be refered by backslash no forward slash.

  2. Use .*? (non-greedy) to match all the characters which exists inbetween .com and the file extension .jpg

    x$URL <- gsub("(?s).*\\b(http://www\\.blooming\\.com\\b.*?\\.jpg\\b).*",
                                  "\\1", x$Product.Description.) 
    

DEMO

Upvotes: 3

Related Questions