Reputation: 2095
I have sets of these files-
objectA-record-data.log file - Contains multiple lines of data with timestamps.
objectA-record-metadata.log file - Contains just one line. Think of it as a metadata for all the records of objectA.
Together they will form one record for processing.
I have been able to process the data file. The mapper sets objectA as the key for all the lines in the data file and reducer processes and prints the text file. Now i want to add metadata also to each of these records. Any ideas how can i do that?
Upvotes: 1
Views: 414
Reputation: 28754
I guess there must be some mapping between your data file and meta file You can get the meta file in the setup method of Mapper like following
protected void setup(
org.apache.hadoop.mapreduce.Mapper<Long, Text, Text, Text>.Context context)
throws java.io.IOException, InterruptedException {
FileSplit split=(FileSplit)context.getInputSplit();
Path path=split.getPath();
Path metaFile=getMetaFile(path);
};
Upvotes: 0
Reputation: 16392
Use DistributedCache:
In the driver/configuration class:
DistributedCache.addCacheFile(new URI("/user/chris/theMetaDataFile.txt"), conf);
In the mapper:
public void setup(Context context) {
Configuration conf = context.getConfiguration();
Path[] cachedFiles = DistributedCache.getLocalCacheFiles(conf);
File metadataFile = new File(chachedFiles[0].toString());
// metadataFile can now be read and the results stored locally for use in the map method
}
Upvotes: 1