Nabih Bawazir
Nabih Bawazir

Reputation: 7255

File found on PySpark but not found in Pandas

When I try this

filePath1='/tmp/cs/data_train/cs_subsprofile.csv'
fd_subsprofile=spark.read.format("csv").schema(schema_f_d_subs_profile).load(filePath1)

or

fd_subsprofile = spark.read.csv('/tmp/cs/data_train/cs_subsprofile.csv', header = True)

it is successful, but when I try

data = pd.read_csv('/tmp/cs/data_train/cs_subsprofile.csv')

The result is

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/cs/data_train/cs_subsprofile.csv'

Upvotes: 0

Views: 290

Answers (1)

pltc
pltc

Reputation: 6082

Your spark probably not reading from local file system, but from a distributed file system (such as HDFS). Pandas only read from local file system and that's the reason why it cannot find the file.

Upvotes: 2

Related Questions