Reputation: 13
I want to run the following regex query in solr name:/.+\.m+d$/
. I have documents in my index with the following names:
readme.md
2013.02.26.md
test.mmd
and none of them match. Removing the $
matches the readme.md entry. I believe the problem is that I need to specify a global pattern modifier but can't find the syntax to do this.
Upvotes: 1
Views: 1722
Reputation: 11023
These are my observations based on experimenting with Solr regex matches:
Do HTML percent encoding of all the special characters in your regex. This site has been helpful for doing the percent encoding manually.
Make sure you do regex matching on string fields if you want to match the entire value. Regex matching on text fields will involve tokenization and will work according to which tokens got produced during indexing.
For solr regexes don't specify the beginning anchor ^
or the end anchor $
, since it always assumes you are matching against the entire string. Unless you specify a .*
or .+
(or some such regex) at the beginning or the end, it is always a match with ^
in the beginning and $
at the end.
I just indexed the 3 values in your question in a string field and issued this query and it matches all the 3 documents:
q=id:/.%2B%5C.m%2Bd/
The PCRE of .%2B%5C.m%2Bd
is .+\.m+d$
.
Upvotes: 3
Reputation: 1147
I tryed this in Reg exp buddy. IT matches your test.
.+\.m+d
php (Preg) syntax for iterate over all matches in string.
preg_match_all('/.+\.m+d/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
This is if ^$ match at line breaks and dot matches new line and case insesitive
preg_match_all('/.+\.m+d/sim', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
Upvotes: 0