Zaffar Saffee
Zaffar Saffee

Reputation: 6305

extract text between two words in php

I got the following URL

http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego

and I want to extract

B000NO9GT4

that is the asin...to now, I can get search between the string, but not in this way I require. I saw the split functin, I saw the explode. but cant find a way out...also, the urls will be different in length so I cant hardcode the length two..the only thing which make some sense in my mind is to split the string so that

http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/

become first part

and

B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego

becomes the 2nd part , from the second part , I should extract B000NO9GT4

in the same way, i would want to get product name LEGO-Ultimate-Building-Set-Pieces from the first part

I am very bad at regex and cant find a way out..

can somebody guide me how I can do it in php?

thanks

Upvotes: 1

Views: 1497

Answers (2)

drew010
drew010

Reputation: 69937

This grabs both pieces of information that you are looking to capture:

$url = 'http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego';

$path = parse_url($url, PHP_URL_PATH);

if (preg_match('#^/([^/]+)/dp/([^/]+)/#i', $path, $matches)) {
    echo "Description = {$matches[1]}<br />"
        ."ASIN = {$matches[2]}<br />";
}

Output:

Description = LEGO-Ultimate-Building-Set-Pieces
ASIN = B000NO9GT4

Short Explanation:

  • Any expressions enclosed in ( ) will be saved as a capture group. This is how we get at the data in $matches[1] and $matches[2].
  • The expression ([^/]+) says to match all characters EXCEPT / so in effect it captures everything in the URL between the two / separators. I use this pattern twice. The [ ] actually defines the character class which was /, the ^ in this case negates it so instead of matching / it matches everything BUT /. Another example is [a-f0-9] which would say to match the characters a,b,c,d,e,f and the numbers 0,1,2,3,4,5,6,7,8,9. [^a-f0-9] would be the opposite.
  • # is used as the delimiter for the expression
  • ^ following the delimiter means match from the beginning of the string.

See www.regular-expressions.info and PCRE Pattern Syntax for more info on how regexps work.

Upvotes: 2

Baba
Baba

Reputation: 95101

You can try

$str = "http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego" ;
list(,$desc,,$num,) = explode("/",parse_url($str,PHP_URL_PATH));
var_dump($desc,$num);

Output

string 'LEGO-Ultimate-Building-Set-Pieces' (length=33)
string 'B000NO9GT4' (length=10)

Upvotes: 2

Related Questions