Reputation: 1083
I have this RegEx from my previous question. The problem is that sometimes it works, Sometimes it doesn't. I tried pasting it on an online simulator and got this: https://regex101.com/r/I3tnY4/3
The text is from a file I read using
file_get_contents
The contents of the file are complete but when I run it through the RegEx to filter it:
$data = file_get_contents($var);
$pat = '/intervals \[\d+\]:\s+\Kxmin = (?P<xmin>\d+(\.\d+)?) \
\s+xmax = (?P<xmax>\d+(\.\d+)?)\s+text = "(?P<text>[^"]*)"/m';
// print_r($data);
preg_match_all($pat, $data, $m);
$result = array_map(function($a){
return array_combine(['xmin', 'xmax', 'text'], $a);
}, array_map(null, $m['xmin'], $m['xmax'], $m['text']));
print_r($result);
it returns an empty array. At first, it was working but when I added a for loop to handle multiple file upload it stopped working.
This also happened before when I tried to process the file right after it was uploaded.
Like this:
if (move_uploaded_file($_FILES["uploadedfile"]["tmp_name"], $target_file)) {
if (file_exists($target_file)) {
$data = file_get_contents($target_file);
$pat = '/intervals \[\d+\]:\s+\Kxmin = (?P<xmin>\d+(\.\d+)?) \
\s+xmax = (?P<xmax>\d+(\.\d+)?)\s+text = "(?P<text>[^"]*)"/m';
preg_match_all($pat, $data, $m);
$result = array_map(function($a){
return array_combine(['xmin', 'xmax', 'text'], $a);
}, array_map(null, $m['xmin'], $m['xmax'], $m['text']));
print_r($result);
}
}
With the above code, the RegEx also failed since the $result array was empty. I figured that was because the file was not yet ready to be read or something. Even though when I printed the contents of the file everything was there. So what I did then was to redirect my page to another file that did the RegEx processing and surprisingly it worked there.
Upvotes: 2
Views: 85
Reputation: 47991
It appears that your task is more focused on substring extraction, rather than validation. For this reason, you can largely reduce the size of your pattern, speed up the execution, and minimize output bloat with the following pattern:
/xmin = (\S+)\s+xmax = (\S+)\s+text = "([^"]*)/
What have I done? (See this demo for official pattern breakdown)
interval...
matching since you are not using it (or more specifically the number inside of []:
\K
because you don't need to "restart" the fullstring match -- you aren't using it.array_map()
and array_combine()
to assign these key names anyhow. Named capture groups cause major output array bloat, and should be avoided unless you have a compelling reason to use them. The reason they cause bloat is because when you name capture groups, preg_match_all()
will write duplicate subarray elements (the named one, and the indexed one) -- this means double the necessary data. While, yes, you can use named capture groups, this would just mean that you would change your mapping
process to remove all of the indexed elements from each subarray ([0],[1],[2],[3]
).\s+
. For the record, you can use whitespaces in your pattern to improve readability, but to do this you need to include x
as a flag at the end of your pattern. The x
pattern modifier will ignore ALL whitespaces used in the pattern, so beware of this effect.(?P<xmax>\d+(\.\d+)?)
with (\S+)
. This will remove the named capture group and the nested capture group, and extract the entire non-whitespace substring. If you DO want to validate this string, then I advise: (\d+(?:\.\d+)?)
This changes the nested group to "non-capturing" -- again reducing output array bloat."
, so that can be removed.m
pattern modifier. You aren't using any anchor metacharacters )(^
or $
) so the flag has no purpose.preg_match_all()
's 4th parameter PREG_SET_ORDER
will structure your subarrays in such a way that only one array_map()
is necessary to set up your multi-dimensional array.This is how I suggest that you implement it:
Code: (Demo)
$data='intervals [1]:
xmin = 0
xmax = 13.139997023062838
text = ""
intervals [2]:
xmin = 13.139997023062838
xmax = 14.763036269953904
text = "Cities are like siblings in a large polygamous family."
intervals [3]:
xmin = 14.763036269953904
xmax = 17.01
text = ""';
$pat='/xmin = (\S+)\s+xmax = (\S+)\s+text = "([^"]*)/';
if(preg_match_all($pat,$data,$m,PREG_SET_ORDER)){
$assoc_multidim=array_map(function($a){return array_combine(['xmin','xmax','text'],array_slice($a,1));},$m);
var_export($assoc_multidim);
}else{
echo "substring extraction failed";
}
Output:
array (
0 =>
array (
'xmin' => '0',
'xmax' => '13.139997023062838',
'text' => '',
),
1 =>
array (
'xmin' => '13.139997023062838',
'xmax' => '14.763036269953904',
'text' => 'Cities are like siblings in a large polygamous family.',
),
2 =>
array (
'xmin' => '14.763036269953904',
'xmax' => '17.01',
'text' => '',
),
)
An alternative method that makes use of your named capture groups would look like this: (Demo)
$pat='/xmin = (?P<xmin>\S+)\s+xmax = (?P<xmax>\S+)\s+text = "(?P<text>[^"]*)/';
if(preg_match_all($pat,$data,$m,PREG_SET_ORDER)){
$assoc_multidim=array_map(function($a){return array_intersect_key($a,['xmin'=>'','xmax'=>'','text'=>'']);},$m);
var_export($assoc_multidim);
}else{
echo "substring extraction failed";
}
...You see, both techniques require a little bit of clean up (unless your processes to follow don't mind the indexed subarrays), this is why I favor the less bloated array.
Upvotes: 2