Using pyspark to read json file directly from a website

Question

is it possible to use sqlContext to read a json file directly from a website? for instance I can read file as such:

myRDD = sqlContext.read.json("sample.json")

but get I an error when I try something like this:

myRDD = sqlContext.read.json("http://192.168.0.13:9200/sample.json")

I'm using Spark 1.4.1 Thanks in advance!

zero323 · Accepted Answer

It is not possible. Paths you use should point to either local file system or other file system supported by Hadoop. As long as sample.json has an expected format (single object per line) you can try something like this:

import json
import requests

r = requests.get("http://192.168.0.13:9200/sample.json")
df = sqlContext.createDataFrame([json.loads(line) for line in r.iter_lines()])

Using pyspark to read json file directly from a website

Answers (1)

Related Questions