Koksi
Koksi

Reputation: 51

Spark Streaming over TCP

I'm currently trying to get Spark Streaming over TCP working but I constantly get a "[Errno 111] Connection refused" error...

import socket
TCP_IP = 'localhost'
TCP_PORT = 40123
MESSAGE = "Test data Test data Test data"

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(MESSAGE)
s.close()

Spark part

import time
from pyspark import SparkContext
from pyspark.streaming import StreamingContext

ssc = StreamingContext(sc,1)

lines = ssc.socketTextStream('localhost',40123)
counts = lines.flatMap(lambda line: line.split(" ")).map(lambda x: (x, 1)).reduceByKey(lambda a, b: a+b)
counts.pprint()

ssc.start()

Upvotes: 2

Views: 3984

Answers (2)

CPL
CPL

Reputation: 11

Try to use the following TCP server code.

data = 'abcdefg'
to_be_sent = data.encode('utf-8')

import socket
# Create a socket with the socket() system call
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)    
# Bind the socket to an address using the bind() system call
s.bind(('localhost', 40123))
# Enable a server to accept connections. 
# It specifies the number of unaccepted connections 
# that the system will allow before refusing new connections
s.listen(1)
# Accept a connection with the accept() system call
conn, addr = s.accept()                                 
# Send data. It continues to send data from bytes until either
# all data has been sent or an error occurs. It returns None.
conn.sendall(to_be_sent)                                
conn.close()

You may also need to define sc in yout pyspark park as follow:

sc = SparkContext(appName="TestApp")

Upvotes: 0

ziXiong
ziXiong

Reputation: 76

socketTextStream can not host a server, It's a just a client. You have to make a server yourself and then spark streaming can connect.

Upvotes: 1

Related Questions