Reputation: 1172
I am trying to extract the camera make & model from exifdata.
The exifdata itself is quite long, 4 lines follow:
JPEG.APP1.Ifd0.ImageDescription = ' '
JPEG.APP1.Ifd0.Make = 'Canon'
JPEG.APP1.Ifd0.Model = 'Canon PowerShot S120'
JPEG.APP1.Ifd0.Orientation = 1 = '0,0 is top left'
I use the following regexs, but they do not match. Are the patterns correct?
make := substring( meta from 'Make\\s+=\\s+(.*)');
model := substring( meta from 'Model\\s+=\\s+(.*)');
Upvotes: 1
Views: 115
Reputation: 29974
subtring([str] from [pattern])
doesn't work like you seem to think it does. You can find the details of how it works here: 9.7.2. SIMILAR TO Regular Expressions. That's the regular expression syntax your call uses.
For starters, there's this relevant bit of info:
As with
SIMILAR TO
, the specified pattern must match the entire data string, or else the function fails and returns null.
Your regular expressions clearly don't match the entire string.
Second is the next sentence:
To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote (")
This isn't quite the standard regex, so you need to be aware of it.
Rather than trying to get subtring([str] from [pattern])
working, I'm going to recommend an alternative: regexp_matches
. This function uses standard POSIX regex syntax, and it returns a text[]
containing all the captured groups from the match. Here's a quick test to show that it works:
SELECT regexp_matches($$JPEG.APP1.Ifd0.ImageDescription = ' '
JPEG.APP1.Ifd0.Make = 'Canon'
JPEG.APP1.Ifd0.Model = 'Canon PowerShot S120'
JPEG.APP1.Ifd0.Orientation = 1 = '0,0 is top left'$$, '(Make)') m;
(I'm using dollar quoting for your example string, in case you're not familiar with that syntax.)
This gives back the array {Make}
.
Second, your regex actually doesn't work, as I found out in my testing. You have two problems:
The double slashes are incorrect. You don't need to escape the \
as PostgreSQL doesn't treat it as an escape character by default. You can read about escaping in strings here; the most relevant section is probably 4.1.2.2. String Constants with C-style Escapes. That section describes what you thought was happening by default, but it actually requires an E
prefix to enable.
Fixing that improves the result:
SELECT regexp_matches($$JPEG.APP1.Ifd0.ImageDescription = ' '
JPEG.APP1.Ifd0.Make = 'Canon'
JPEG.APP1.Ifd0.Model = 'Canon PowerShot S120'
JPEG.APP1.Ifd0.Orientation = 1 = '0,0 is top left'$$, 'Make\s+=\s+(.*)') m;
now gives an array containing this string:
'Canon'
JPEG.APP1.Ifd0.Model = 'Canon PowerShot S120'
JPEG.APP1.Ifd0.Orientation = 1 = '0,0 is top left'
This brings us to...
The (.*)
is matching everything to the end of the string, not the end of the line. You can actually fix this by doing something you probably want to do anyway: get the single quote marks out of the match. You can use this pattern to do that:
$$Make\s+=\s+'([^']+)'$$
I've used dollar quoting again, this time to avoid the ugliness of escaping all those single quote marks. Now the query is:
SELECT regexp_matches($$JPEG.APP1.Ifd0.ImageDescription = ' '
JPEG.APP1.Ifd0.Make = 'Canon'
JPEG.APP1.Ifd0.Model = 'Canon PowerShot S120'
JPEG.APP1.Ifd0.Orientation = 1 = '0,0 is top left'$$, $$Make\s+=\s+'([^']+)'$$) m;
which gives you pretty much exactly what you want: an array containing just the string Canon
. You'll need to extract the result from the array, of course, but I'll leave that as an exercise for you.
That should be enough info for you to get the second expression working, too.
P.S. PostgreSQL's truly fine manual is a blessing.
Upvotes: 3