user2454360
user2454360

Reputation: 91

How to read a CSV file from Hdfs?

I have my Data in a CSV file. I want to read the CSV file which is in HDFS.

Can anyone help me with the code??

I'm new to hadoop. Thanks in Advance.

Upvotes: 2

Views: 21769

Answers (2)

Tariq
Tariq

Reputation: 34184

The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this :

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
        System.out.println(inputStream.readChar());         
    }

FSDataInputStream has several read methods. Choose the one which suits your needs.

If it is MR, it's even easier :

        public static class YourMapper extends
                    Mapper<LongWritable, Text, Your_Wish, Your_Wish> {

                public void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {

                    //Framework does the reading for you...
                    String line = value.toString();      //line contains one line of your csv file.
                    //do your processing here
                    ....................
                    ....................
                    context.write(Your_Wish, Your_Wish);
                    }
                }
            }

Upvotes: 6

dino.keco
dino.keco

Reputation: 1401

If you want to use mapreduce you can use TextInputFormat to read line by line and parse each line in mapper's map function.

Other option is to develop (or find developed) CSV input format for reading data from file.

There is one old tutorial here http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html but logic is same in new versions

If you are using single process for reading data from file it is same as reading file from any other file system. There is nice example here https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs

HTH

Upvotes: 2

Related Questions