Rahul Prasad
Rahul Prasad

Reputation: 8222

Is it good to use json encoding while logging data?

I have to log alot of data, which will be analyzed later on. I am currently not analyzing it. Later on we will analyze it using Hadoop. How ? I dont know. But size of log are too much.

So I am looking into a format which will take less size and will be easy to analyze later on.

I thought of saving it as coma separated value, but log may contain comma and newlines. Then I thought of encoding it using JSON or Making each field as BASE64 encode. but then I don't know if we will be able to analyze it later on.

What log format should I use, which will be easier to analyze later on ?

Upvotes: 2

Views: 115

Answers (3)

Rahul Prasad
Rahul Prasad

Reputation: 8222

As suggested by one of the engineer from www.qubole.com. I used csv format, because querying on terabytes of log file using hadoop is more expensive (time taking) when using JSON encoded lines.

Upvotes: 1

Dmitri Goldring
Dmitri Goldring

Reputation: 4363

As long as you generate your log statement with a well-structured format string you should be able to usefully parse it later; likely with a regular expression.

JSON will bloat your log horribly and not improve your ability to parse it. The only scenario where it might make sense is where you need to dump objects in your log.

Upvotes: 2

Halcyon
Halcyon

Reputation: 57723

CSV allows you to escape data like:

1,2,"value with, comma","value with
newline","value with "" quote"
1,2,"foo","bar","baz"

So commas or newlines should be no problem. Use fputcsv when writing to the file.

CSV probably gets you the smallest filesize since the delimiter overhead is minimal.


If space is an issue can you can always just gzip compress the files.


Base64 typically inflates data by about 33%

Upvotes: 1

Related Questions