Deepthi
Deepthi

Reputation: 79

Performance issue with JSON input

I am loading mysql table from a mongodb source through kettle. Mongodb table has more than 4 million records and when I run the kettle job it takes 17 hours to finish the first time load. Even for incremental load it takes more than a hour.I tried with increasing commit size and also giving more memory to the job, but still performance is not improving. I think JSON input step takes a very long time to parse the data and hence its very slow. I have these steps in my transformation

  1. Mongodb input step
  2. Json Input
  3. Strings cut
  4. If field value is null
  5. Concat fields
  6. Select values
  7. Table output.

Same 4 million records when extracted from postgre was way more fast than mongodb. Is there a way I can improve the performance? Please help me.

Thanks, Deepthi

Upvotes: 1

Views: 1423

Answers (1)

Codek
Codek

Reputation: 5164

Run multiple copies of the step. It sounds like you have mongo input then a json input step to parse the json results right? So use 4 or 8 copies of the json input step ( or more depending on cpu's) and it'll speed up.

Alternatively do you really need to parse the full json, maybe you can extract the data via a regex or something.

Upvotes: 0

Related Questions