Reputation: 2438
I have python scripts (no Spark here) producing some data files, that I want to be readable easily as Dataframes in a scala/spark application.
What's the best choice ?
Upvotes: 0
Views: 69
Reputation: 17431
If your data doesn't have newlines in then a simple text-based format such as TSV is probably best.
If you need to include binary data then a separated format like protobuf makes sense - anything for which a hadoop InputFormat exists should be fine.
Upvotes: 1