Reputation: 149
How and why does the notation s[s] work?
I'm taking one of the micro-courses from kaggle.com and they use the notation s[s] as shown below. I have not seen that before. X_train is a pandas DataFrame.
Is it a list slicing itself? Would someone help clarify this?
s = (X_train.dtypes == 'object') ## assigns True to variables == 'object'
object_cols = list(s[s].index)
> s
Type True
Method True
Regionname True
Rooms False
Distance False
Postcode False
Bedroom2 False
Bathroom False
Landsize False
Lattitude False
Longtitude False
Propertycount False
dtype: bool
> s[s]
Type True
Method True
Regionname True
dtype: bool
Upvotes: 3
Views: 1115
Reputation: 109
This is quite complicated.
X_train is a pandas data frame.
X_train.dtypes is returning a pandas Series, where the index (name of each row) is equal to the column name.
We now do == on the Series which returns a new series, with the value true or false. So it looks like:
a True
b False
c True
Now we get to the x[x] which says to remove the 'false' values, giving a new Series:
a True
c True
Now we do .index and turn it into a list to give
['a', 'c' ]
Upvotes: 3
Reputation: 3288
Pandas DataFrames allow you to index using boolean arrays which is how s
is being used inside []
. The value of the Series is True
or False
as you can see, so we're selecting the values of s
where s
is True
. The purpose of this code is to get columns where the datatype is object
, you can do it with the function pandas.DataFrame.select_dtypes
instead:
list(X_train.select_dtypes(include=['object']).columns)
Upvotes: 1