mathieu
mathieu

Reputation: 2438

A file format writable by python, readable as a Dataframe in Spark

I have python scripts (no Spark here) producing some data files, that I want to be readable easily as Dataframes in a scala/spark application.

What's the best choice ?

Upvotes: 0

Views: 69

Answers (1)

lmm
lmm

Reputation: 17431

If your data doesn't have newlines in then a simple text-based format such as TSV is probably best.

If you need to include binary data then a separated format like protobuf makes sense - anything for which a hadoop InputFormat exists should be fine.

Upvotes: 1

Related Questions