Reputation:
I'm trying to assess a string based on the suffix of the files that it contains.
I need to differentiate between strings that contain only image files (.png
,.gif
, .jpg
,.jpeg
, or .bmp
) and strings which contain a mixture of image and non-image files.
What am I doing wrong?
if (preg_match('~\.(png\)|gif\)|jpe?g\)|bmp\))~', $data->files)) {
echo 'image only;'
} else {
echo 'image + other types';
}
Example string containing a mixture:
filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx)
Example string containing only images:
filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg)
Upvotes: 1
Views: 2729
Reputation: 9957
The regular expression is wrong. You have )
after each extension. This will work:
~\.(png|gif|jpe?g|bmp)~i
Complete example:
<?php
if (preg_match('~\.(png|gif|jpe?g|bmp)~i', "https://example.com/test.png")) {
echo 'image only';
}
else {
echo 'image + other types';
}
With the corrected regex, now you can check if the batch of files contains only images, images and files, or only files. We already got the first part down (checking if there are images). With this regex, we can check if there's non-images:
/^(?!.*[.](png|gif|jpe?g|bmp))(?:.*$|,)/im
It uses a negative lookahead to assert that the extensions are not matched in the line. At the end there's a non-capturing group to check for the end of line or a comma (to comply to your format).
So finally, check both regular expressions and see what each batch really contains:
$files=[
'Non-Images Only'=>'filename 1 (https://example.com/test.exe)',
'Mixed-Type'=>'filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)',
'Images-Only'=>'filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))'];
foreach ($files as $type => $batch) {
echo "Batch: ".$batch.PHP_EOL;
echo "Expecting: ".$type.PHP_EOL;
$images = preg_match('/\.(png|gif|jpe?g|bmp)/im', $batch);
$nonImages = preg_match('/^(?!.*[.](png|gif|jpe?g|bmp))(?:.*$|,)/im', $batch);
$result = "";
if ($images && $nonImages) {
$result = "Mixed-Type";
}
else {
if ($images) {
$result = "Images-Only";
}
else {
$result = "Non-Images Only";
}
}
echo "Result: ".$result.PHP_EOL;
echo PHP_EOL;
}
Note: used @mickmackusa's list of tests
Upvotes: 4
Reputation: 48071
After reading and re-reading your question more than 20 times, I think I know what you are trying to do.
For every string (batch of files), I run two preg_match()
checks. One that seeks files with a suffix of png
,gif
,jpg
,jpeg
, or bmp
. Another that seeks files that DO NOT have a suffix in the aforementioned list.
*note: (*SKIP)(*FAIL)
is a technique used to match and immediately disqualify characters in a pattern.
Code: (PHP Demo) (Image Pattern Demo) (Non-Image Pattern Demo)
$tests=[
'Non-Images Only'=>'filename 1 (https://example.com/test.exe)',
'Mixed-Type'=>'filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)',
'No Files'=>'filename 1 (),
filename 2 ()',
'Images-Only'=>'filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))'];
$image_pattern='~\.(?:png|gif|jpe?g|bmp)\),?$~im';
$non_image_pattern='~\.(?:(?:png|gif|jpe?g|bmp)(*SKIP)(*FAIL)|[^.)]+)\),?$~im';
foreach($tests as $type=>$string){
echo "\t\tAssessing:\n---\n";
echo "$string\n---\n";
echo "Expecting: $type\n";
echo "Assessed as: ";
$has_image=preg_match($image_pattern,$string);
$has_non_image=preg_match($non_image_pattern,$string);
if($has_image){
if($has_non_image){
echo "Mix of image and non-image files";
}else{
echo "Purely image files";
}
}else{
if($has_non_image){
echo "Purely non-image files";
}else{
echo "No files recognized";
}
}
echo "\n----------------------------------------------------\n";
}
Output:
Assessing:
---
filename 1 (https://example.com/test.exe)
---
Expecting: Non-Images Only
Assessed as: Purely non-image files
----------------------------------------------------
Assessing:
---
filename 1 (https://example.com/test.pdf),
filename 2 (https://example.com/cool_image.jpg),
filename 3 (https://example.com/other-file.docx),
filename 4 (https://example.com/nice_image.png)
---
Expecting: Mixed-Type
Assessed as: Mix of image and non-image files
----------------------------------------------------
Assessing:
---
filename 1 (),
filename 2 ()
---
Expecting: No Files
Assessed as: No files recognized
----------------------------------------------------
Assessing:
---
filename 1 (https://example.com/another.png),
filename 2 (https://example.com/cool_image.jpg))
---
Expecting: Images-Only
Assessed as: Purely image files
----------------------------------------------------
Upvotes: 1
Reputation: 42384
You're escaping your brackets, so they're getting treated literally.
The regex you're looking is simply: ~(\.png|gif|jpe?g|bmp)$~
if (preg_match('~(\.png|gif|jpe?g|bmp)$', $data->files)) {
echo 'image only;'
}
else {
echo 'image + other types';
}
Note that the $
at the end to denote the end of the string is critical; without it, any part of the string would be a valid match. As such, a file such as .jpg.exe
would be considered an 'image'.
Running the regex (\.png|gif|jpe?g|bmp)$
against the strings:
https://example.com/test.pdf
https://example.com/other-file.docx
https://example.com/cool_image.jpg.exe
https://example.com/cool_image.jpg
Shows that only the final link will match.
This can be seen working here.
Note that you'll also probably want to throw the i
modifier on the end of your regex to allow for file extensions in uppercase as well. This can be done with ~(\.png|gif|jpe?g|bmp)$~i
.
Upvotes: 1