Trams
Trams

Reputation: 421

How many types of InputFormat is there in Hadoop?

I'm new to Hadoop and wondering how many types of InputFormat are there in Hadoop such as TextInputFormat? Is there a certain InputFormat that I can use to read files via http requests to remote data servers?

Thanks :)

Upvotes: 8

Views: 8741

Answers (2)

Ravindra babu
Ravindra babu

Reputation: 38910

There are many classes implementing InputFormat

CombineFileInputFormat, CombineSequenceFileInputFormat, 
CombineTextInputFormat, CompositeInputFormat, DBInputFormat,
FileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat, 
MultiFileInputFormat, NLineInputFormat, Parser.Node, 
SequenceFileAsBinaryInputFormat, SequenceFileAsTextInputFormat, 
SequenceFileInputFilter, SequenceFileInputFormat, TextInputFormat

Have a look at this article on when to use which type of Inputformat.

Out of these, most frequently used formats are:

  • FileInputFormat : Base class for all file-based InputFormats
  • KeyValueTextInputFormat : An InputFormat for plain text files. Files are broken into lines. Either line feed or carriage-return are used to signal end of line. Each line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty.
  • TextInputFormat : An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..
  • NLineInputFormat : NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.
  • SequenceFileInputFormat : An InputFormat for SequenceFiles.

Regarding second query, get the files from remote servers first and use appropriate InputFileFormat depending on contents in file. Hadoop works best for data locality.

Upvotes: 7

Durga Viswanath Gadiraju
Durga Viswanath Gadiraju

Reputation: 3956

Your first question - how many types of InputFormat are there in Hadoop such as TextInputFormat?

  1. TextInputFormat - each line will be treated as value
  2. KeyValueTextInputFormat - First value before delimiter is key and rest is value
  3. FixedLengthInputFormat - Each fixed length value is considered to be value
  4. NLineInputFormat - N number of lines is considered one value/record
  5. SequenceFileInputFormat - For binary

Also there is DBInputFormat to read from databases

You second question - there is no input format to read files via http requests.

Upvotes: 4

Related Questions