Reputation: 1
I'm new to coding and am still learning. That being said, I've been following a tutorial on how to do data analysis from twitter API: http://adilmoujahid.com/posts/2014/07/twitter-analytics/
I believe he's using python 2.7 while I am using python 3.6.1 so I have converted the code to the python version I am using and so far it has worked until I got to the top 5 countries graph. Specifically, when I try to run the code for the top 5 countries which worked two days ago only once, now I only get the following error message:
"---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-47-601663476327> in <module>()
7 ax.set_ylabel('Number of tweets' , fontsize=15)
8 ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
----> 9 tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
10 plt.show()
~/Environments/Environments/my_env/lib/python3.6/site- packages/pandas/plotting/_core.py in __call__(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
2441 colormap=colormap, table=table, yerr=yerr,
2442 xerr=xerr, label=label, secondary_y=secondary_y,
-> 2443 **kwds)
2444 __call__.__doc__ = plot_series.__doc__
2445
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
1882 yerr=yerr, xerr=xerr,
1883 label=label, secondary_y=secondary_y,
-> 1884 **kwds)
1885
1886
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
1682 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
1683
-> 1684 plot_obj.generate()
1685 plot_obj.draw()
1686 return plot_obj.result
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in generate(self)
236 def generate(self):
237 self._args_adjust()
--> 238 self._compute_plot_data()
239 self._setup_subplots()
240 self._make_plot()
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)
345 if is_empty:
346 raise TypeError('Empty {0!r}: no numeric data to '
--> 347 'plot'.format(numeric_data.__class__.__name__))
348
349 self.data = numeric_data
TypeError: Empty 'DataFrame': no numeric data to plot"
Has anyone else encountered this and/or what's the best solution? I can't figure out how to fix this. Thank you!
Entire Code (to date)
import json
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
tweets_data_path = '...twitter_data.txt'
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
print (len (tweets_data))
tweets = pd.DataFrame()
tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data))
tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))
tweets['country'] = list(map(lambda tweet: tweet['place']['country'] if tweet['place'] != None else None, tweets_data))
tweets_by_lang = tweets['lang'].value_counts()
fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')
plt.show()
tweets_by_country = tweets['country'].value_counts()
fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Countries', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
plt.show()
Upvotes: 0
Views: 9404
Reputation: 151
Is your data actually numeric? You can check using, for example,
print(type(tweets['country'][0]))
Given that you're using json.loads
(deserializing from string) it's very likely NOT numeric, which is what the error might be referring to. Try to convert the data type to float (or whatever):
tweets = tweets.astype('float')
and see if that solves the problem. You can also apply this function just to specific columns if you want. Good luck!
Upvotes: 1
Reputation: 647
I think your file isn't present or there is a path issue. The first two steps http://adilmoujahid.com/posts/2014/07/twitter-analytics/ retrieves the file and saves it locally. Is the file present in the specified path ?
tweets_data_path = '...twitter_data.txt'
What does the following return ?
print (len (tweets_data))
Upvotes: 0