Peter Bakker
Peter Bakker

Reputation: 193

Harvesting single items with DSpace

Is it possible to harvest single items from other repositories with DSpace? Perhaps from command line? As far as I can see, with XMLUI only harvesting complete communities or complete collections is possible. But then I get mostly too many items I don't need.

Upvotes: 2

Views: 1174

Answers (3)

Peter Bakker
Peter Bakker

Reputation: 193

As Terry wrote you can harvest a single item/document from a repositry with a GetRecord request. With the DSpace menu-item 'Batch Import (ZIP)' item(s) can be imported, if the content of the zip has a specific format.

The following PHP code extracts the metatdata from the by GetRecord created XML. In the next step this metadata is packed in XML-format that DSpace understands. This XML is added as file (dublin_core.xml) to the created ZIP, together with a small file (handle) containing the handle. Finally the ZIP is written to the server.

BTW Importing the zip-file can also be done from commandline, as Terry mentioned in his first answer.

<?php 
// handle and harvest-string
$handle  = "1874/1506";
$harvest = "http://dspace.library.uu.nl/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:dspace.library.uu.nl:" . $handle;

// get XML from source repository
$sxe = simplexml_load_file($harvest, "SimpleXMLElement");

// add namespace schema-urls
$sxe->registerXPathNamespace('oai_dc', 'http://www.openarchives.org/OAI/2.0/oai_dc/');
$sxe->registerXPathNamespace('dc', 'http://purl.org/dc/elements/1.1/');

// get Dublin Core (dc) elements from the XML
foreach($sxe->xpath("//oai_dc:dc") as $entry) {
    $child = $entry->children('dc', true);
}

// add dc-elements (names and values) to array
foreach($child as $elementName => $elementValue) {$elements[$elementName][]  = $elementValue;}

// create zip-object and -file
$zip = new ZipArchive();
$zip->open("doc/importZip.zip", ZipArchive::CREATE);

// create a directory in the zip-object
$zip->addEmptyDir("item");

// create Dublin Core XML object
$oXML = new DOMDocument();
$oXML->encoding      = "UTF-8";
$oXML->formatOutput  = true;
$oXML->xmlStandalone = false;

$oRoot = $oXML->createElement('dublin_core');
$oRoot->setAttribute('schema', 'dc');
$oXML->appendChild($oRoot);

// add elements and their values to XML object
foreach($elements as $elementName => $elementValues) {
    foreach($elementValues as $elementValue) {
        $oDcValue = $oXML->createElement('dcvalue');
        $oDcValue->setAttribute('element', $elementName);
        $oText = $oXML->createTextNode($elementValue);
        $oDcValue->appendChild($oText);
        $oRoot->appendChild($oDcValue);
    }
}

// save created XML to string
$dublinCoreXml = $oXML->saveXML();

// add XML-string as file to zip-object
$zip->addFromString("item/1/dublin_core.xml", $dublinCoreXml);

// add handle as file to zip-object
$zip->addFromString("item/1/handle", $handle);

$zip->close();

?>

Upvotes: 2

terrywb
terrywb

Reputation: 3956

The OAI-PMH standard provides a method GetRecord.

https://knb.ecoinformatics.org/knb/docs/oaipmh.html

If you navigate the set containing your item of interest, you should be able to find the item's identifier. You can use that identifier as a parameter to GetRecord.

Example: https://repository.library.georgetown.edu/oai/request?verb=GetRecord&identifier=oai:repository.library.georgetown.edu:10822/503788&metadataPrefix=qdc

This would allow you to extract the item metadata. In order to get the item into DSpace, I imagine that you would need to package the item for ingest into the repository.

Upvotes: 2

terrywb
terrywb

Reputation: 3956

If you are looking to pull a single item via the command line, consider the packager command.

https://wiki.duraspace.org/display/DSDOC5x/Importing+and+Exporting+Content+via+Packages

Upvotes: 1

Related Questions