ak_lawati
ak_lawati

Reputation: 1

How to retrieve location when streaming twitter data using Pyspark

I am working on streaming tweets using PYSpark in real-time.

I want to retrieve text, location, username. Currently, I am receiving tweet text only. Is there is anyway to get the location also.

lines = ssc.socketTextStream("localhost", 5550)

I'm using this line of code to get the tweets.

Upvotes: 0

Views: 104

Answers (1)

ak_lawati
ak_lawati

Reputation: 1

I just found the answer .. We need to update the twitter listener ..

def on_data(self, data):
    try:
        msg = json.loads(data)
        if ('retweeted_status' in msg):
            if ('extended_tweet' in msg['retweeted_status']):
                print(msg['retweeted_status']['extended_tweet']['full_text'])
                print(" | The Location is " + str(msg['user']['location']) )
                self.client_socket.send((str(msg['retweeted_status']['extended_tweet']['full_text']) + "\n").encode('utf-8'))
        elif ('extended_status' in msg):
            print(msg['extended_status']['full_text'])
            print(" | The Location is " + str(msg['user']['location']) )
            self.client_socket.send((str(msg['extended_status']['full_text']) + "\n").encode('utf-8'))
        else:
            print(msg['text'])
            print(" | The Location is " + str(msg['user']['location']) )
            self.client_socket.send((str(msg['text']) + "\n").encode('utf-8'))
    except BaseException as e:
        print("Error on_data: %s" % str(e))

    return True 

Upvotes: 0

Related Questions