Reputation: 53
Will require your help please. I have a raw data json files containing many files in timestamp format folder structure . When I run crawler it is able to detect 116 columns but is not able to detect 5 columns which are present in the files but has a very low frequency. Can somebody let me know as how can I detect 5 columns which are not there.
Structure of the file is :
{"serialNumber":"PNRF","delivered":1601656317296,"timestamp":"1601656317","ecd4":"-5","pt":"PTR"}
{"serialNumber":"PNRT","delivered":1601656317296,"timestamp":"1601656317","ecd4":"-5","pt":"PIF0"}
Upvotes: 3
Views: 931
Reputation: 3387
I have faced the similar issues with Glue crawler. You have two options to solve it:
Both of these solutions are tested in a project.
Upvotes: 2