bbnn
bbnn

Reputation: 3612

Wikipedia links regex in PHP

How can I draw only the words in [[words]] into array?

[[旭川市|旭川]](文化) - [[アイヌ]]文化、[[旭川市旭山動物園|旭山動物園]]など

I tried \[\[.*]] but it didn't work, maybe it is because .* is only for English strings..

Upvotes: 1

Views: 763

Answers (4)

hippietrail
hippietrail

Reputation: 16974

One problem is that you're using the greedy wildcard: \[\[.*]] will match from the first [[ to the last ]], including any intervening ]].

Most regex engines now also include a nongreedy wildcard, typically *? so \[\[.*?]] would just match one wikilink at a time.

Upvotes: 0

bcosca
bcosca

Reputation: 17555

preg_match_all('/\[\[(.+?)\]\]/u',$str,$matches);
var_dump($matches);

Upvotes: 2

jcomeau_ictx
jcomeau_ictx

Reputation: 38462

You need to backslash both sides, all the square brackets need to be escaped.

This worked in Python, may need modification for PHP:


>>> re.compile('\[\[(.*?)\]\]')
<_sre.SRE_Pattern object at 0xb747ebf0>
>>> r=_
>>> r.search(text)
<_sre.SRE_Match object at 0xb7469560>
>>> r.findall(text)
['\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82|\xe6\x97\xad\xe5\xb7\x9d', '\xe3\x82\xa2\xe3\x82\xa4\xe3\x83\x8c', '\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92|\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92']

Hmm, maybe I'm wrong about having to escape the right-square brackets, turned out it wasn't necessary in Python.

Upvotes: 0

Brettski
Brettski

Reputation: 20091

You can encode the Unicode first:

[&#26093;&#24029;&#24066;&#26093;&#23665;&#21205;&#29289;&#22290;&#124;&#26093;&#23665;&#21205;&#29289;&#22290;&#93;&#93;&#12394;&#12393l]

Upvotes: 0

Related Questions