pavan vamsi
pavan vamsi

Reputation: 49

What's the difference between df.iloc[:,1:2].values and df.iloc[:,1].values in pandas?

When i used x = dataset.iloc[:,1:2].values and later on in my code

import matplotlib.pyplot as plt
import numpy as np
dataset = pd.read_csv('Position_Salaries.csv')
x = dataset.iloc[:,1:2].values #look here please
y = dataset.iloc[:,-1].values
from sklearn.svm import SVR
sv_regressor = SVR(kernel='rbf')

so when i used x = dataset.iloc[:,1].values instead, i got an error saying

'expected 2d array and got 1d array instead'

in the sv_regresso line

The error is in sv_regressor line w, that's why i tagged sklearn

Upvotes: 0

Views: 4236

Answers (2)

Mykola Zotko
Mykola Zotko

Reputation: 17854

The difference is that with dataset.iloc[:,1:2] you will get a DataFrame and with dataset.iloc[:,-1] you will get a Series. When you use the attribute values with a DataFrame you get a 2d ndarray and with a Series you get a 1d ndarray. Consider the following example:

   A  B  C
0  0  2  0
1  1  0  0
2  1  2  1

Series:

type(df.iloc[:, -1])
# pandas.core.series.Series

df.iloc[:, -1].values.shape
# (3,)

DataFrame:

type(df.iloc[:, -1:])
# pandas.core.frame.DataFrame

df.iloc[:, -1:].values.shape
# (3, 1)

It's a common trick in machine learning to get a target variable as 2d ndarray in one step.

Upvotes: 3

Martín Alcubierre
Martín Alcubierre

Reputation: 4397

It's almost the same, dataset.iloc[:,1:2] gives you a 2-d dataframe (columns from 1 to 2), dataset.iloc[:,1] gives you a pandas series (1-d) (from column 1).

Upvotes: 0

Related Questions