Electrino
Electrino

Reputation: 2890

How to plot a regression tree in Python

So, first of all, I'm relatively new to Python so I'm not sure how to achieve my task. I was following an online tutorial on how to plot a decision tree using the Iris dataset (for classification). However, I'm trying to plot a single tree from regression.

Here's a snip of the data I'm using: Data

Here's the code I was using:

# Import Libraries and Load Data
import pandas as pd 
data = pd.read_csv("/Users/.../Desktop/cars_test.csv") 
import matplotlib.pyplot as plt
import numpy as np
cars = data

# Model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=10)

# Train
model.fit(cars.data, cars.target)

# Extract single tree for analysis
estimator = model.estimators_[5]

However, I'm getting an error that I'm not sure how to fix... The error I'm getting is:

AttributeError                            Traceback (most recent call
last) <ipython-input-27-37164305d7fe> in <module>()
     10 
     11 # Train
---> 12 model.fit(cars.data, cars.target)
     13 
     14 # Extract single tree for analysis

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in
__getattr__(self, name)    4370             if self._info_axis._can_hold_identifiers_and_holds_name(name):    4371   
return self[name]
-> 4372             return object.__getattribute__(self, name)    4373     4374     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'data'

Any suggestions as to what I'm doing wrong?

Upvotes: 0

Views: 593

Answers (1)

Andrew Guy
Andrew Guy

Reputation: 9968

You need to adapt the code to deal with your own data (note that the DataFrame you loaded doesn't have attributes for target or data). This means extracting the matrix of input data (X) and response variable (y) from your original dataset. I'm making a few assumptions here, but you can adapt accordingly.

# Import Libraries and Load Data
import pandas as pd 
data = pd.read_csv("/Users/.../Desktop/cars_test.csv") 
import matplotlib.pyplot as plt
import numpy as np
cars = data

# Model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=10)

X = cars.loc[:, cars.columns != 'th_km_per_year'].values
y = cars['th_km_per_year'].values

# Train
model.fit(X, y)

# Extract single tree for analysis
estimator = model.estimators_[5]

Upvotes: 1

Related Questions