Reputation: 59
I am curious as to why when I create a data frame in the manner below, using lists to create the values in the rows does not graph and gives me the error "ValueError: x must be a label or position"
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
values = [9.83, 19.72, 7.19, 3.04]
values
[9.83, 19.72, 7.19, 3.04]
cols = ['Condition', 'No-Show']
conditions = ['Scholarship', 'Hipertension', 'Diabetes', 'Alcoholism']
df = pd.DataFrame(columns = [cols])
df['Condition'] = conditions
df['No-Show'] = values
df
Condition No-Show
0 Scholarship 9.83
1 Hipertension 19.72
2 Diabetes 7.19
3 Alcoholism 3.04
df.plot(kind='bar', x='Condition', y='No-Show');
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 df.plot(kind='bar', x='Condition', y='No-Show')
File ~\anaconda3\lib\site-packages\pandas\plotting\_core.py:938, in
PlotAccessor.__call__(self, *args, **kwargs)
936 x = data_cols[x]
937 elif not isinstance(data[x], ABCSeries):
--> 938 raise ValueError("x must be a label or position")
939 data = data.set_index(x)
940 if y is not None:
941 # check if we have y as int or list of ints
ValueError: x must be a label or position
Yet if I create the same DataFrame a different way, it graphs just fine....
df2 = pd.DataFrame({'Condition': ['Scholarship', 'Hipertension', 'Diatebes', 'Alcoholism'],
'No-Show': [9.83, 19.72, 7.19, 3.04]})
df2
Condition No-Show
0 Scholarship 9.83
1 Hipertension 19.72
2 Diatebes 7.19
3 Alcoholism 3.04
df2.plot(kind='bar', x='Condition', y='No-Show')
plt.ylim(0, 50)
#graph appears here just fine
Can someone enlighten me why it works the second way and not the first? I am a new student and am confused. I appreciate any insight.
Upvotes: 0
Views: 616
Reputation: 153510
Let's look at pd.DataFrame.info for both dataframes.
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (Condition,) 4 non-null object
1 (No-Show,) 4 non-null float64
dtypes: float64(1), object(1)
memory usage: 192.0+ bytes
Note, your column headers are tuples with a empty second element.
Now, look at info for df2.
df2.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Condition 4 non-null object
1 No-Show 4 non-null float64
dtypes: float64(1), object(1)
memory usage: 192.0+ bytes
Note your column headers here are strings.
As, @BigBen states in his comment you don't need the extra brackets in your dataframe constructor for df.
FYI... to fix your statement with the incorrect dataframe constructor for df.
df.plot(kind='bar', x=('Condition',), y=('No-Show',))
Upvotes: 1