Reputation: 17
I want to convert "pyspark.sql.dataframe.DataFrame" data to pandas. At the last line, "ConnectionRefusedError: [WinError 10061] Connection failed because the destination computer refused the connection" error occured. How can I fix it?
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession, Row
import pandas as pd
import numpy as np
import os
import sys
# spark setting
# local
conf = SparkConf().set("spark.driver.host", "127.0.0.1")
sc = SparkContext(conf=conf)
# session
spark = SparkSession.builder.master("local[1]").appName("test_name").getOrCreate()
# file
path = "./data/fhvhv_tripdata_2022-10.parquet"
# header가 있는 경우 option 추가
data = spark.read.option("header", True).parquet(path)
# Error ocurred
pd_df = data.toPandas()
I want to convert "pyspark.sql.dataframe.DataFrame" data to pandas.
Upvotes: 0
Views: 170
Reputation: 191973
First, ensure you're running pyspark 3.2 or higher, as that's where koalas was added natively.
Then, Connection errors could be many things, but have nothing to do with pandas. Your code is correct. It's the network/configuration that is not. For example, on Windows, you'll need to configure external binary called winutils
.
Note: You don't need a SparkContext here. You can pass options via SparkSession builder.
Otherwise, you're not using Hadoop. So, don't use Spark at all How to read a Parquet file into Pandas DataFrame?
Upvotes: 1