Reputation: 317
I'm trying to import a document collection where some files start with an underscore. It seems Marklogic mlcp 8.0.4 is skipping these files silently, even though Marklogic itself seems to have no problem with filenames like that.
This is the mlcp command I'm using:
mlcp-8.0-4/bin/mlcp.sh import -host localhost -port 8012 -username xxxxx -password xxxx -mode local -input_file_path /Users/test/Downloads/tempfolder33/ -output_uri_replace "^.*tempfolder33,''"
where filenames like "/Users/test/Downloads/tempfolder33/schemas/bwb/_manifest.xml" are consistently ignored by mlcp.
Any thoughts on how to fix this?
Upvotes: 3
Views: 240
Reputation: 504
MarkLogic uses the hadoop-mapreduce-client-core library (org.apache.hadoop) which defines the abstract FileInputFormat class. This class uses the private static final PathFilter hiddenFileFilter which is always active. This filter defines files starting with "_" and "." as hidden and those files will be skipped automatically regardless of your own defined filters.
private static final PathFilter hiddenFileFilter = new PathFilter() {
public boolean accept(Path p) {
String name = p.getName();
return !name.startsWith("_") && !name.startsWith(".");
}
};
If you are proficient in Java you could download a copy of mlcp sources from here https://developer.marklogic.com/products/mlcp and try to override the protected listStatus method from the FileInputFormat class in the FileAndDirectoryInputFormat class to not include the hiddenFileFilter from the FileInputFormat class from the hadoop-mapreduce-client-core library.
Hope this helps
Peter
Upvotes: 6