Reputation: 327
I am using matplotlib to plot bar charts of data in my DataFrame. I use this construction to first plot over the whole dataset:
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
Temp_Counts = Counter(weatherDFConcat['TEMPBIN_CONS'])
df = pd.DataFrame.from_dict(Temp_Counts, orient = 'index').sort_index()
df.plot(kind = 'bar', title = '1969-2015 National Temp Bins', legend = False, color = ['r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b','r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g' ] )
Now I would like to plot the same column of data except I would like to do so over a particular subset of data. For each region in 'region_name' I would like to generate the bar plot. Here is an example of my DataFrame.
My attempted solution is to write:
if weatherDFConcat['REGION_NAME'].any() == 'South':
Temp_Counts = Counter(weatherDFConcat['TEMPBIN_CONS'])
df = pd.DataFrame.from_dict(Temp_Counts, orient = 'index').sort_index()
df.plot(kind = 'bar', title = '1969-2015 National Temp Bins', legend = False, color = ['r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b','r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g' ] )
plt.show()
When I run this code it oddly only works for the 'South' region. For 'South' the plot is generated but for any other regions I try the code runs (I get no error message) but the plot never shows up. Running my code for any region other than south produces this result in the console.
The South region is the first part in my DataFrame, which is 40 million lines long, with other regions being further down. Could the size of the DataFrame I'm trying to plot have anything to do with this?
Upvotes: 3
Views: 9369
Reputation: 20811
If I'm understanding your question correctly, you are trying to do two things prior to plotting:
Filter based on REGION_NAME
.
Within that filtered dataframe, count how many times each value in the TEMPBIN_CONS
column appears.
You can do both of those things right within pandas:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'STATE_NAME': ['Alabama', 'Florida', 'Maine', 'Delaware', 'New Jersey'],
'GEOID': [1, 2, 3, 4, 5],
'TEMPBIN_CONS': ['-3 to 0', '-3 to 0', '0 to 3', '-3 to 0', '0 to 3'],
'REGION_NAME': ['South', 'South', 'Northeast', 'Northeast', 'Northeast']},
columns=['STATE_NAME', 'GEOID', 'TEMPBIN_CONS', 'REGION_NAME'])
df_northeast = df[df['REGION_NAME'] == 'Northeast']
northeast_count = df_northeast.groupby('TEMPBIN_CONS').size()
print df
print df_northeast
print northeast_count
northeast_count.plot(kind='bar')
plt.show()
output:
STATE_NAME GEOID TEMPBIN_CONS REGION_NAME
0 Alabama 1 -3 to 0 South
1 Florida 2 -3 to 0 South
2 Maine 3 0 to 3 Northeast
3 Delaware 4 -3 to 0 Northeast
4 New Jersey 5 0 to 3 Northeast
STATE_NAME GEOID TEMPBIN_CONS REGION_NAME
2 Maine 3 0 to 3 Northeast
3 Delaware 4 -3 to 0 Northeast
4 New Jersey 5 0 to 3 Northeast
TEMPBIN_CONS
-3 to 0 1
0 to 3 2
dtype: int64
Upvotes: 2