Reputation: 31
I gathered live tweets for an hour before the recent Champions League match between Juventus and Atlético Madrid.
#setting tweepy up
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
translator = Translator()
#gathering live tweets with probable hashtags for the fixture for an hour before the game starts
class Listener(StreamListener):
def on_data(self, status):
print(status)
with open('Juve_vs_AthMadrid.json', 'a') as f:
f.write(status)
return True
def on_error(self, status):
print(status)
return True
twitter_stream = Stream(auth, Listener())
twitter_stream.filter(track=['#Juve', '#juve', '#JuveAtleti', '#turin',
'#AúpaAtleti', '#ForzaJuve', '#AtléticosAroundTheWorld!', '#VamosAtleti',
'#AtléticosPorElMundo'])
Next, I proceeded to clean the data. I created a list with each tweet dictionary (as a string) in it and tried to convert these strings into actual python dictionaries using the json.loads function
handle = open('Juve_vs_AthMadrid.json', 'r')
file = handle.readlines()
handle.close()
dic_list = []
for dic_str in file:
dic_list.append(json.loads(dic_str))
However, I keep getting raise JSONDecodeError("Expecting value", s, err.value) from None error on line dic_list.append(json.loads(dic_str))
Upvotes: 1
Views: 11141
Reputation: 13661
Example of reading a JSON file and storing the data in a Python dictionary:
example.py
:
import json
with open("example.json", "r") as json_data:
data = json.loads(json_data.read())
print(type(data))
print(data)
example.json
:
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
Output:
<class 'dict'>
{'glossary': {'GlossDiv': {'GlossList': {'GlossEntry': {'SortAs': 'SGML', 'Abbrev': 'ISO 8879:1986', 'ID': 'SGML', 'GlossTerm': 'Standard Generalized Markup Language', 'GlossDef': {'GlossSeeAlso': ['GML', 'XML'], 'para': 'A meta-markup language, used to create markup languages such as DocBook.'}, 'GlossSee': 'markup', 'Acronym': 'SGML'}}, 'title': 'S'}, 'title': 'example glossary'}}
The output shows that data
is dict
type.
Upvotes: 3