Reputation: 123
I wanted to know the difference between these two lines of code
X_train = training_dataset.iloc[:, 1].values
X_train = training_dataset.iloc[:, 1:2].values
My guess is that the latter is a 2-D numpy array and the former is a 1-D numpy array. For inputs in a neural network, the latter is the proper way for input data, is there are specific reason for that?
Please help!
Upvotes: 0
Views: 1301
Reputation: 17156
Difference is iloc returns a Series with a single row or column is selected but a Dataframe with a multiple row or column ranges reference
Although they both refer to column 1, 1 and 1:2 are different types, with 1 representing an int and 1:2 representing a slice.
With,
X_train = training_dataset.iloc[:, 1].values
You specify a single column so training_dataset.iloc[:, 1] is a Pandas Series, so .values is a 1D Numpy array
Vs.,
X_train = training_dataset.iloc[:, 1:2].values
Although it becomes one column, [1:2] is a slice you represents a column range so training_dataset.iloc[:, 1:2] is a Pandas Dataframe. Thus, .values is a 2D Numpy array
Test as follows:
Create training_dataset Dataframe
data = {'Height':[1, 14, 2, 1, 5], 'Width':[15, 25, 2, 20, 27]}
training_dataset = pd.DataFrame(data)
Using .iloc[:, 1]
print(type(training_dataset.iloc[:, 1]))
print(training_dataset.iloc[:, 1].values)
# Result is:
<class 'pandas.core.series.Series'>
# Values returns a 1D Numpy array
0 15
1 25
2 2
3 20
4 27
Name: Width, dtype: int64,
Using iloc[:, 1:2]
print(type(training_dataset.iloc[:, 1:2]))
print(training_dataset.iloc[:, 1:2].values)
# Result is:
<class 'pandas.core.frame.DataFrame'>
# Values is a 2D Numpy array (since values of Pandas Dataframe)
[[15]
[25]
[ 2]
[20]
[27]],
X_train Values Var Type <class 'numpy.ndarray'>
Upvotes: 1
Reputation: 4618
Not quite that, they have both have ndim=2, just check by doing this:
X_train.ndim
The difference is that in the second one it doesn't have a defined second dimension if you want to see the difference between the shapes I suggest reading this: Difference between numpy.array shape (R, 1) and (R,)
Upvotes: 1