Narender nayyar
Narender nayyar

Reputation: 53

AWS Crawler not able to read all the columns

Will require your help please. I have a raw data json files containing many files in timestamp format folder structure . When I run crawler it is able to detect 116 columns but is not able to detect 5 columns which are present in the files but has a very low frequency. Can somebody let me know as how can I detect 5 columns which are not there.

Structure of the file is :

{"serialNumber":"PNRF","delivered":1601656317296,"timestamp":"1601656317","ecd4":"-5","pt":"PTR"} 
{"serialNumber":"PNRT","delivered":1601656317296,"timestamp":"1601656317","ecd4":"-5","pt":"PIF0"}

Upvotes: 3

Views: 931

Answers (1)

amsh
amsh

Reputation: 3387

I have faced the similar issues with Glue crawler. You have two options to solve it:

  • Manually add the missing columns via Databases -> Tables -> Click table -> Edit Schema -> Add column. You will see the updated table.
  • If there is a data manipulation stage before cataloging, add the missing columns in all records with None value.

Both of these solutions are tested in a project.

Upvotes: 2

Related Questions