MPir786
MPir786

Reputation: 1

TypeError: Empty 'DataFrame': no numeric data to plot

I'm new to coding and am still learning. That being said, I've been following a tutorial on how to do data analysis from twitter API: http://adilmoujahid.com/posts/2014/07/twitter-analytics/

I believe he's using python 2.7 while I am using python 3.6.1 so I have converted the code to the python version I am using and so far it has worked until I got to the top 5 countries graph. Specifically, when I try to run the code for the top 5 countries which worked two days ago only once, now I only get the following error message:

"---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-47-601663476327> in <module>()
          7 ax.set_ylabel('Number of tweets' , fontsize=15)
          8 ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
    ----> 9 tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
          10 plt.show()
~/Environments/Environments/my_env/lib/python3.6/site-    packages/pandas/plotting/_core.py in __call__(self, kind, ax, figsize, use_index,   title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2441                            colormap=colormap, table=table, yerr=yerr,
   2442                            xerr=xerr, label=label,  secondary_y=secondary_y,
-> 2443                            **kwds)
   2444     __call__.__doc__ = plot_series.__doc__
   2445 

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   1882                  yerr=yerr, xerr=xerr,
   1883                  label=label, secondary_y=secondary_y,
-> 1884                  **kwds)
   1885 
   1886 

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   1682         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   1683 
-> 1684     plot_obj.generate()
   1685     plot_obj.draw()
   1686     return plot_obj.result

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in generate(self)
    236     def generate(self):
    237         self._args_adjust()
--> 238         self._compute_plot_data()
    239         self._setup_subplots()
    240         self._make_plot()

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)
    345         if is_empty:
    346             raise TypeError('Empty {0!r}: no numeric data to '
--> 347                             'plot'.format(numeric_data.__class__.__name__))
    348 
    349         self.data = numeric_data

    TypeError: Empty 'DataFrame': no numeric data to plot"

Has anyone else encountered this and/or what's the best solution? I can't figure out how to fix this. Thank you!

Entire Code (to date)

import json
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

tweets_data_path = '...twitter_data.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue

print (len (tweets_data))

tweets = pd.DataFrame()

tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data))
tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))
tweets['country'] = list(map(lambda tweet: tweet['place']['country'] if     tweet['place'] != None else None, tweets_data))

graph by top 5 languages

tweets_by_lang = tweets['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')
plt.show()

graph by top 5 countries

tweets_by_country = tweets['country'].value_counts()

fig, ax = plt.subplots() 
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Countries', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
plt.show()

Upvotes: 0

Views: 9404

Answers (2)

Dave B.
Dave B.

Reputation: 151

Is your data actually numeric? You can check using, for example,

print(type(tweets['country'][0]))

Given that you're using json.loads (deserializing from string) it's very likely NOT numeric, which is what the error might be referring to. Try to convert the data type to float (or whatever):

tweets = tweets.astype('float')

and see if that solves the problem. You can also apply this function just to specific columns if you want. Good luck!

Upvotes: 1

Linda
Linda

Reputation: 647

I think your file isn't present or there is a path issue. The first two steps http://adilmoujahid.com/posts/2014/07/twitter-analytics/ retrieves the file and saves it locally. Is the file present in the specified path ?

    tweets_data_path = '...twitter_data.txt'

What does the following return ?

    print (len (tweets_data)) 

Upvotes: 0

Related Questions