Reputation: 498
I want to read a local file into spark. I'm using Windows. Used the following command:
input = sc.textFile("D://sample.txt")
I tried every possible combination, but constantly get the following or very similar error.
I tried the following:
adding file:/// and file://
adding file:\\ and file:\
D:/sample.txt
D:\sample.txt
D:\sample.txt
The current working directory is D:\ and the file exists on D:\
Anyone an idea?
>>> input = sc.textFile("D://sample.txt")
15/10/27 02:37:37 INFO MemoryStore: ensureFreeSpace(157288) called with curMem=7
891904, maxMem=556038881
15/10/27 02:37:37 INFO MemoryStore: Block broadcast_46 stored as values in memor
y (estimated size 153.6 KB, free 522.6 MB)
15/10/27 02:37:37 INFO MemoryStore: ensureFreeSpace(14276) called with curMem=80
49192, maxMem=556038881
15/10/27 02:37:37 INFO MemoryStore: Block broadcast_46_piece0 stored as bytes in
memory (estimated size 13.9 KB, free 522.6 MB)
15/10/27 02:37:37 INFO BlockManagerInfo: Added broadcast_46_piece0 in memory on
localhost:52887 (size: 13.9 KB, free: 529.6 MB)
15/10/27 02:37:37 INFO SparkContext: Created broadcast 46 from textFile at null:
-1
Also, do we always use backslashes in the command line when using Windows? Or is it just for directories?
Upvotes: 3
Views: 2161
Reputation: 5895
Can you give os.path.normpath a try
import os
input = sc.textFile(os.path.normpath("D:/sample.txt"))
os.path.normpath(path)
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes. To normalize case, use normcase().
Source: https://docs.python.org/2/library/os.path.html#os.path.normpath
Upvotes: 2