Reputation: 263
Using version 4.2.0 of pypdf, I would like to access XMP metadata from a file. The xmp_metadata property allows this readthedocs and provides access to many standard items as properties (e.g., dc_date). However, the accessible data is not always complete: there are many metadata items that I can see using a PDF reader but cannot read using pypdf.
So, my question is this: can other metadata elements be accessed in some way?
I suspect that the XmpInformation.get_element
method would allow this. If so, can anyone explain how to use it, perhaps by example?
If pypdf cannot access other metadata elements, which other Python packages should I look at using? EDIT: See: this answer
Additional information
As an example, here is the XMP metadata embedded in a PDF file for a scientific paper published by the Institute of Physics journal Metrologia.
I would just like to know if it is possible to access some of this information (e.g., prism:doi
) with the help of pypdf.
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c003 61.141987, 2011/02/22-12:03:51">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
xmlns:xap="http://ns.adobe.com/xap/1.0/"
xmlns:xapRights="http://ns.adobe.com/xap/1.0/rights/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:prism="http://prismstandard.org/namespaces/basic/2.0/"
xmlns:fr="http://www.crossref.org/fundref.xsd"
xmlns="http://www.crossref.org/schema/4.3.3"
xmlns:crossmark="http://crossref.org/crossmark/1.0/">
<rdf:Description rdf:about=""
xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
<pdfx:doi>10.1088/0026-1394/52/4/613</pdfx:doi>
<pdfx:robots>noindex</pdfx:robots>
<pdfx:CrossMarkMajorVersionDate>2015-8-3</pdfx:CrossMarkMajorVersionDate>
<pdfx:CrossmarkDomainExclusive>true</pdfx:CrossmarkDomainExclusive>
<pdfx:CrossMarkDomains><rdf:Seq><rdf:li>iop.org</rdf:li></rdf:Seq></pdfx:CrossMarkDomains>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xap="http://ns.adobe.com/xap/1.0/">
<xap:CreatorTool>IOPP</xap:CreatorTool>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xapRights="http://ns.adobe.com/xap/1.0/rights/">
<xapRights:Marked>True</xapRights:Marked>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Comment on ‘Dimensionless units in the SI’</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq><rdf:li>B P Leonard</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:publisher>
<rdf:Bag>
<rdf:li>IOP Publishing</rdf:li>
</rdf:Bag>
</dc:publisher>
<dc:identifier>doi:10.1088/0026-1394/52/4/613</dc:identifier>
<dc:description>Metrologia, 52 (2015) 613. doi: 10.1088/0026-1394/52/4/613</dc:description>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:prism="http://prismstandard.org/namespaces/basic/2.0/">
<prism:aggregationType>journal</prism:aggregationType>
<prism:publicationName>Metrologia</prism:publicationName>
<prism:copyright>© 2015 BIPM & IOP Publishing Ltd</prism:copyright>
<prism:issn>0026-1394</prism:issn>
<prism:startingPage>613</prism:startingPage>
<prism:endingPage>616</prism:endingPage>
<prism:pageRange>613</prism:pageRange>
<prism:doi>10.1088/0026-1394/52/4/613</prism:doi>
<prism:url>http://dx.doi.org/10.1088/0026-1394/52/4/613</prism:url>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:crossmark="http://crossmark.crossref.org">
<crossmark:MajorVersionDate>2015-8-3</crossmark:MajorVersionDate>
<crossmark:CrossmarkDomainExclusive>true</crossmark:CrossmarkDomainExclusive>
<crossmark:DOI>10.1088/0026-1394/52/4/613</crossmark:DOI>
<crossmark:CrossMarkDomains><rdf:Seq><rdf:li>iop.org</rdf:li></rdf:Seq></crossmark:CrossMarkDomains>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
Upvotes: 1
Views: 131