Developer Rajinikanth
Developer Rajinikanth

Reputation: 354

How read excel file format in pyspark databricks notebook

How to read the xlsx file format in azure databricks notebook with pyspark programming. we are tried as below code but getting error.

import pandas as pd
spark.createDataFrame(pd.read_excel("/Volumes/test/vls/data/empty data.xlsx"))

is it possible without external library to access the xlsx format?

error : PySparkTypeError: Exception thrown when converting pandas.Series (object) with name

Upvotes: -1

Views: 95

Answers (1)

JayashankarGS
JayashankarGS

Reputation: 7985

Below are possible approaches without installing external library

Use pandas and create spark dataframe. Below is the sample code.

import pandas as pd

spark.createDataFrame(pd.read_excel("path_to_excel_file/sample5000.xlsx")).display()

Output:

Unnamed: 0 First Name Last Name Gender Country Age Date Id
1 Dulce Abril Female United States 32 15/10/2017 1562
2 Mara Hashimoto Female Great Britain 25 16/08/2016 1582
3 Philip Gent Male France 36 21/05/2015 2587
4 Kathleen Hanner Female United States 25 15/10/2017 3549

enter image description here

or

import pyspark

pyspark.pandas.read_excel("file:/<file_path>/sample5000.xlsx").display()

If you don't want to use pandas then only way is to install com.crealytics.spark.excel library in cluster.

Upvotes: 1

Related Questions