lmm_5000
lmm_5000

Reputation: 149

How do I interpret this notation?

How and why does the notation s[s] work?

I'm taking one of the micro-courses from kaggle.com and they use the notation s[s] as shown below. I have not seen that before. X_train is a pandas DataFrame.

Is it a list slicing itself? Would someone help clarify this?

s = (X_train.dtypes == 'object') ## assigns True to variables == 'object'
object_cols = list(s[s].index)
> s

Type              True
Method            True
Regionname        True
Rooms            False
Distance         False
Postcode         False
Bedroom2         False
Bathroom         False
Landsize         False
Lattitude        False
Longtitude       False
Propertycount    False
dtype: bool
> s[s]

Type          True
Method        True
Regionname    True
dtype: bool

Upvotes: 3

Views: 1115

Answers (2)

Abrar Hossain
Abrar Hossain

Reputation: 109

This is quite complicated.

X_train is a pandas data frame.

X_train.dtypes is returning a pandas Series, where the index (name of each row) is equal to the column name.

We now do == on the Series which returns a new series, with the value true or false. So it looks like:

a True b False c True

Now we get to the x[x] which says to remove the 'false' values, giving a new Series:

a True c True

Now we do .index and turn it into a list to give

['a', 'c' ]

Upvotes: 3

TomNash
TomNash

Reputation: 3288

Pandas DataFrames allow you to index using boolean arrays which is how s is being used inside []. The value of the Series is True or False as you can see, so we're selecting the values of s where s is True. The purpose of this code is to get columns where the datatype is object, you can do it with the function pandas.DataFrame.select_dtypes instead:

list(X_train.select_dtypes(include=['object']).columns)

Upvotes: 1

Related Questions