Wildo_Baggins311
Wildo_Baggins311

Reputation: 59

Pandas Data Frame Graphing Issue

I am curious as to why when I create a data frame in the manner below, using lists to create the values in the rows does not graph and gives me the error "ValueError: x must be a label or position"

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

values = [9.83, 19.72, 7.19, 3.04]
values
[9.83, 19.72, 7.19, 3.04]
cols = ['Condition', 'No-Show']
conditions = ['Scholarship', 'Hipertension', 'Diabetes', 'Alcoholism']
df = pd.DataFrame(columns = [cols])
df['Condition'] = conditions
df['No-Show'] = values
df
Condition   No-Show
0   Scholarship 9.83
1   Hipertension    19.72
2   Diabetes    7.19
3   Alcoholism  3.04
df.plot(kind='bar', x='Condition', y='No-Show');

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 df.plot(kind='bar', x='Condition', y='No-Show')

File ~\anaconda3\lib\site-packages\pandas\plotting\_core.py:938, in 
PlotAccessor.__call__(self, *args, **kwargs)
    936         x = data_cols[x]
    937     elif not isinstance(data[x], ABCSeries):
--> 938         raise ValueError("x must be a label or position")
    939     data = data.set_index(x)
    940 if y is not None:
    941     # check if we have y as int or list of ints

    ValueError: x must be a label or position

Yet if I create the same DataFrame a different way, it graphs just fine....

df2 = pd.DataFrame({'Condition': ['Scholarship', 'Hipertension', 'Diatebes', 'Alcoholism'], 
'No-Show': [9.83, 19.72, 7.19, 3.04]})
df2
Condition   No-Show
0   Scholarship 9.83
1   Hipertension    19.72
2   Diatebes    7.19
3   Alcoholism  3.04
df2.plot(kind='bar', x='Condition', y='No-Show')
plt.ylim(0, 50)
#graph appears here just fine

Can someone enlighten me why it works the second way and not the first? I am a new student and am confused. I appreciate any insight.

Upvotes: 0

Views: 616

Answers (1)

Scott Boston
Scott Boston

Reputation: 153510

Let's look at pd.DataFrame.info for both dataframes.

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   (Condition,)  4 non-null      object 
 1   (No-Show,)    4 non-null      float64
dtypes: float64(1), object(1)
memory usage: 192.0+ bytes

Note, your column headers are tuples with a empty second element.

Now, look at info for df2.

df2.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Condition  4 non-null      object 
 1   No-Show    4 non-null      float64
dtypes: float64(1), object(1)
memory usage: 192.0+ bytes

Note your column headers here are strings.

As, @BigBen states in his comment you don't need the extra brackets in your dataframe constructor for df.

FYI... to fix your statement with the incorrect dataframe constructor for df.

df.plot(kind='bar', x=('Condition',), y=('No-Show',))

Upvotes: 1

Related Questions