Reputation: 23
I'm working with thousands of METS files and need to be able to identify ones that do not contain particular strings, e.g. mods:genre. I am not searching within file names, just inside files that lack specific content.
I have tried searching for JAVA regex syntax because that is apparently the regex flavor that Oxygen uses. All I can find is full JAVA code sets. I am still very new to regex and hope that someone on these boards has already figured out how to do what I need to do.
Here is an example metadata file: https://uflorida-my.sharepoint.com/:u:/g/personal/gwswicord_ufl_edu/EeHF7UHXSX1NqbkbIrB8FWMBKIC_UTWPnV5fwPbZBXhSNg?e=xt5q0n. It is part of a set of over 39,000 files. It does not contain the tag <mods:genre authority="aat">theses</mods:genre>. I need to identify all files in the set that also lack that tag.
In the Oxygen Find/Replace in Files dialog box, in the Text to find box with with the Regular expression check box selected, I have tried: (?s)\A((?!<mods:genre authority="aat">theses</mods:genre>).)+\z
It didn't return any results.
Regards, G.W.
Upvotes: -1
Views: 111
Reputation: 52858
If you're looking at XML files in oXygen, I'm not sure why you'd use regex in a find.
You should use "XPath in Files..." (which just opens the XPath/XQuery Builder where scope is set).
Try this XPath...
/*[not(.//*:genre[@authority='aat'][.='theses'])]
Upvotes: 0