Reputation: 325
I am scraping parts of a webpage and then inserting the results into mySQL.
The source code of a problem area is:
<span class="profilelastlogin">
31,
Kiev, Ukraine
</span>
I want to select the 3 items, Age, City, Country and then assign them each to an individual varible.
I am using this regex to select to full string but it doesn't work. I would appreciate any guidance.
$regexAgeCityCountry = '/<span class="profilelastlogin">(.*?)<\/span>/';
preg_match_all($regexAgeCityCountry, $page, $outputAgeCityCountry);
Upvotes: 0
Views: 889
Reputation: 8781
Why don't just match 3 separate groups?
/<span class="profilelastlogin">(.*?),(.*?),(.*?)<\/span>/s
Group 1 contains the age, group 2 the city and group 3 contains the country.
You could also use this regex to make sure the age will always be numeric:
/<span class="profilelastlogin">([0-9]*),(.*?),(.*?)<\/span>/s
Upvotes: 1
Reputation: 67968
<span class="profilelastlogin">\s+\K|\G(?!^)([^,]+),?\s*(?=[\s\S]*<\/span>)
You can try this to capture 3 parts.See demo.
https://www.regex101.com/r/rK5lU1/28
$re = "/<span class=\"profilelastlogin\">\\s+\\K|\\G(?!^)([^,]+),?\\s*(?=[\\s\\S]*<\\/span>)/mi";
$str = "<span class=\"profilelastlogin\">\n 31,\n Kiev, Ukraine\n </span>";
preg_match_all($re, $str, $matches);
Upvotes: 0
Reputation: 119
put all data in 1 variable first, than
$arr = explode(',',$yourvariable);
$city = $arr[0];
$state = $arr[1];
$country = $arr[2];
Upvotes: 0
Reputation:
You can use the s (PCRE_DOTALL) modifier to treat your code as a single line, so the '.' will match newline characters.
Here is the php reference:
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
Here is a working example with a fix
Upvotes: 1