pravali thota
pravali thota

Reputation: 1

Hadoop jar command to run python mapper and reducer

I'm trying to run python mapper and reducer on Hadoop through putty to print the year and temperature as keys and values extracted from zip files in my folder.(reducer should print the keys and values and save it to output folder temperatures_years.

I've added the python codes for mapper and reducer as well as commands used on putty.

I'm not able to get any output after the hadoop jar command on putty. Please help me find what's wrong. Thank you!

Here are the details:

Mapper:

mapper.py

import sys
import zipfile
import io
import os
import re

input_folder = sys.argv[1]  # Get the input folder path from command-line arguments

Iterating over zip files in the input folder

for filename in os.listdir(input_folder):
`your text`if filename.endswith('.zip'):
        `your text`with zipfile.ZipFile(os.path.join(input_folder, filename), 'r') as zip_file:
            `your text`for inner_filename in zip_file.namelist():
                `your text`with zip_file.open(inner_filename) as file:
                    `your text`for line in io.TextIOWrapper(file):
                        `your text`# Process each line from the file
                        `your text`val = line.strip()
            `your text`(year, temp) = (val[15:19], int(val[87:92]))
            `your text`if temp == 9999:
                `your text`sys.stderr.write("reporter:counter:Temperature,Missing,1\n")

Reducer: #reducer

import sys

# Function to process the input key and values
def process_input(key, values):
`your text`# Print the key and all the associated values
`your text`for value in values:
        `your text`print(key, value)

`your text`# Initializing variables to hold key-value pairs
current_key = None
current_values = []

`your text`# Iterating over lines of input received from mapper
for line in sys.stdin:
`your text`# Splitting the line into key and value
`your text`key, value = line.strip().split('\t', 1)
    
`your text`# If the key has changed, process the previous key-value pair
`your text`if key != current_key:
    `your text`if current_key is not None:
        `your text`process_input(current_key, current_values)
    `your text`current_key = key
    `your text`current_values = []

`your text`# Add the value to the list of values for the current key
`your text`current_values.append(value)

`your text`# Processing the last key-value pair
if current_key is not None:
`your text`process_input(current_key, current_values)

Commands used:

setting the HADOOP_CLASSPATH environment variable

export HADOOP_CLASSPATH=/home/student93/

copying CourseProjectData file and mapper and reducer python files from server hard drive to #hdfs

hdfs dfs -copyFromLocal /home/student93/Data /home/93student93/
hdfs dfs -copyFromLocal /home/student93/project_mapper1.py /home/93student93/
hdfs dfs -copyFromLocal /home/student93/project_reducer1.py /home/93student93/

hadoop jar command to run mapper and reducer on the input and saving to output

hadoop jar hadoop-streaming-2.9.0.jar \
  -input /home/93student93/Data \
  -output /home/93student93/temperatures_years \
  -mapper project_mapper1.py \
  -reducer project_reducer1.py \
  -file project_mapper1.py \
  -file project_reducer1.py

I tried changing the python code since it deals with zip files. Couldn't get the expected results of extracting and printing years and temperatures on hdfs. But, there's no proper output for hadoop jar command if the task is complete. I see INFO mapreduce.Job: Running job: , then it stops and no other output. I'm Supposed to save the output from job into a text file.

Upvotes: 0

Views: 27

Answers (0)

Related Questions