Andrey
Andrey

Reputation: 23

Hadoop error while processing file with brackets

I have a lot of different files *.doc, *.pdf and so on. I wanted to process them with mapReduce.

I put them in HDFS and then started java MapReduce program using Hue.

If files are well formated and doesn't have brackets "(){}[]" in their name all goes fine.

But if there is a file OPN_last_[age.PDF

I get this errors:

    Failing Oozie Launcher, Main class [distr.fors.ru.Index], main() threw exception, Illegal file pattern: Unclosed character class near index 17
    OPN_last_[age.PDF
    ^
    java.io.IOException: Illegal file pattern: Unclosed character class near index 17
    OPN_last_[age.PDF
    ^
    at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:70)
    at org.apache.hadoop.fs.GlobFilter.<init>(GlobFilter.java:49)
    at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1670)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1627)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:211)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
    at distr.fors.ru.Index.run(Index.java:78)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at distr.fors.ru.Index.main(Index.java:39)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
    Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 17
    OPN_last_[age.PDF
    ^
    at org.apache.hadoop.fs.GlobPattern.error(GlobPattern.java:167)
    at org.apache.hadoop.fs.GlobPattern.set(GlobPattern.java:151)
    at org.apache.hadoop.fs.GlobPattern.<init>(GlobPattern.java:42)
    at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:66)
    ... 32 more

If there is a file like this: {2011-01-27} (3769330).pdf

I get such error:

    Input Pattern hdfs://fd-bigdata.distr.fors.ru:8020/{2011-01-27} (3769330).pdf matches 0 files 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) 
    t org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) 
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) 
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) 
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) 
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) 
    at distr.fors.ru.Index.run(Index.java:76) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at distr.fors.ru.Index.main(Index.java:37) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

I realy need to process such files. What can I make to solve such problems?

P.S. I am using the latest CDH 4.4.0.

Upvotes: 2

Views: 1826

Answers (1)

Viacheslav Rodionov
Viacheslav Rodionov

Reputation: 2345

To deal with special characters in Java you should escape them with double backslash '\':

'[' => '\\['
'}' => '\\}' 

This works for me in Java, in Pig and in Oozie. Hope it will also solve your problem.

Upvotes: 2

Related Questions