Leonius
Leonius

Reputation: 81

How to read bz2 files into dataframes using pyspark?

I can read a json file into a dataframe in Pyspark using

spark = SparkSession.builder.appName('GetDetails').getOrCreate()
df = spark.read.json("path to json file")

However, when i try to read a bz2(compressed csv) into a dataframe it gives me an error. I am using:

spark = SparkSession.builder.appName('GetDetails').getOrCreate()
df = spark.read.load("path to bz2 file")

Could you please help correct me?

Upvotes: 8

Views: 4920

Answers (1)

Serhii Sokolenko
Serhii Sokolenko

Reputation: 6164

The method spark.read.load() has an optional parameter format which by default is 'parquet'.

So, for your code to work it should look like this:


df = spark.read.load("data.json.bz2", format="json")

Also, spark.read.json will perfectly work for compressed JSON files, e.g.:


df = spark.read.json("data.json.bz2")

Upvotes: 2

Related Questions