h-y-b-r-i-d
h-y-b-r-i-d

Reputation: 325

Regex to select between span content and then seperate result

I am scraping parts of a webpage and then inserting the results into mySQL.

The source code of a problem area is:

<span class="profilelastlogin">
                    31,
                Kiev, Ukraine
                </span>

I want to select the 3 items, Age, City, Country and then assign them each to an individual varible.

I am using this regex to select to full string but it doesn't work. I would appreciate any guidance.

$regexAgeCityCountry = '/<span class="profilelastlogin">(.*?)<\/span>/';
                preg_match_all($regexAgeCityCountry, $page, $outputAgeCityCountry);

Upvotes: 0

Views: 889

Answers (4)

Rolf ツ
Rolf ツ

Reputation: 8781

Why don't just match 3 separate groups?

 /<span class="profilelastlogin">(.*?),(.*?),(.*?)<\/span>/s

Group 1 contains the age, group 2 the city and group 3 contains the country.

You could also use this regex to make sure the age will always be numeric:

/<span class="profilelastlogin">([0-9]*),(.*?),(.*?)<\/span>/s

Upvotes: 1

vks
vks

Reputation: 67968

<span class="profilelastlogin">\s+\K|\G(?!^)([^,]+),?\s*(?=[\s\S]*<\/span>)

You can try this to capture 3 parts.See demo.

https://www.regex101.com/r/rK5lU1/28

$re = "/<span class=\"profilelastlogin\">\\s+\\K|\\G(?!^)([^,]+),?\\s*(?=[\\s\\S]*<\\/span>)/mi";
$str = "<span class=\"profilelastlogin\">\n 31,\n Kiev, Ukraine\n </span>";

preg_match_all($re, $str, $matches);

Upvotes: 0

Param sohi
Param sohi

Reputation: 119

put all data in 1 variable first, than

$arr = explode(',',$yourvariable);

$city = $arr[0];

$state = $arr[1]; 

$country = $arr[2];

Upvotes: 0

user2010925
user2010925

Reputation:

You can use the s (PCRE_DOTALL) modifier to treat your code as a single line, so the '.' will match newline characters.

Here is the php reference:

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Here is a working example with a fix

Upvotes: 1

Related Questions