Reputation: 101
I am hung up here trying to fix the problem here. https://app.pluralsight.com/guides/machine-learning-concepts-python-and-scikit-learn
I am hung up here and can somebody explain how to fix the problem?
import pandas as pd
wine_data_frame = pd.DataFrame(data=wine_data['data'], columns=wine_data['feature_names'])
wine_data_frame['class'] = wine_data['target']
wine_classes = [wine_data_frame[wine_data_frame['class'] == x for x in range(3)]
I get error "Invalid syntax" for the wine_classes line and tried to fix the error without any luck! Can anybody help?
Upvotes: 0
Views: 1038
Reputation: 153510
Missing a bracket after 'x', bad documentation on that site.
wine_classes = [wine_data_frame[wine_data_frame['class'] == x] for x in range(3)]
Complete code:
import pandas as pd
from sklearn.datasets import load_wine
wine_data = load_wine()
wine_data_frame = pd.DataFrame(data=wine_data['data'], columns=wine_data['feature_names'])
wine_data_frame['class'] = wine_data['target']
wine_classes = [wine_data_frame[wine_data_frame['class'] == x] for x in range(3)]
testing_data = []
for wine_class in wine_classes:
row = wine_class.sample()
testing_data.append(row)
wine_data_frame = wine_data_frame.drop(row.index)
Output:
[ alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline class
2 13.16 2.36 2.67 18.6 101.0 2.8 3.24 0.3 2.81 5.68 1.03 3.17 1185.0 0, alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline class
91 12.0 1.51 2.42 22.0 86.0 1.45 1.25 0.5 1.63 3.6 1.05 2.65 450.0 1, alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline class
150 13.5 3.12 2.62 24.0 123.0 1.4 1.57 0.22 1.25 8.6 0.59 1.3 500.0 2]
and wine_dataframe:
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline class
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065.0 0
1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050.0 0
3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480.0 0
4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735.0 0
5 14.20 1.76 2.45 15.2 112.0 3.27 3.39 0.34 1.97 6.75 1.05 2.85 1450.0 0
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
173 13.71 5.65 2.45 20.5 95.0 1.68 0.61 0.52 1.06 7.70 0.64 1.74 740.0 2
174 13.40 3.91 2.48 23.0 102.0 1.80 0.75 0.43 1.41 7.30 0.70 1.56 750.0 2
175 13.27 4.28 2.26 20.0 120.0 1.59 0.69 0.43 1.35 10.20 0.59 1.56 835.0 2
176 13.17 2.59 2.37 20.0 120.0 1.65 0.68 0.53 1.46 9.30 0.60 1.62 840.0 2
177 14.13 4.10 2.74 24.5 96.0 2.05 0.76 0.56 1.35 9.20 0.61 1.60 560.0 2
[175 rows x 14 columns]
Upvotes: 1
Reputation: 4284
First, add a bracket after ==x
, second, to filter on a pandas dataframe you need to add .loc
before the bracket
wine_classes = [wine_data_frame.loc[wine_data_frame['class'] == x] for x in range(3)]
This will give you a list of dataframes
Upvotes: 1
Reputation: 11
I am assuming the "wine_data" is already in a dataframe.
wine_data_frame = wine_data_frame.rename(columns = {'target':'class'})
wine_classes = wine_data_frame[wine_data_frame['class'].isin(range(3))]
Upvotes: 0