Reputation: 1
import pandas as pd
train =pd.read_csv("https://datahack.analyticsvidhya.com/media/workshop_train_file/train_gbW7HTd.csv")
train[train.dtypes=='object']
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Upvotes: 0
Views: 1410
Reputation: 210872
You can use DataFrame.select_dtypes() method:
train.select_dtypes(['object'])
to select all non-numeric columns (strings, datetimes, etc.):
train.select_dtypes(exclude='number')
Demo:
In [92]: train.select_dtypes(['object']).head(2)
Out[92]:
Workclass Education Marital.Status Occupation Relationship Race Sex Native.Country \
0 State-gov Bachelors Never-married Adm-clerical Not-in-family White Male United-States
1 Self-emp-not-inc Bachelors Married-civ-spouse Exec-managerial Husband White Male United-States
Income.Group
0 <=50K
1 <=50K
In [93]: train.select_dtypes(exclude='number').head(2)
Out[93]:
Workclass Education Marital.Status Occupation Relationship Race Sex Native.Country \
0 State-gov Bachelors Never-married Adm-clerical Not-in-family White Male United-States
1 Self-emp-not-inc Bachelors Married-civ-spouse Exec-managerial Husband White Male United-States
Income.Group
0 <=50K
1 <=50K
Upvotes: 1
Reputation: 3295
I think you are looking for .loc
. Try this:
df.loc[:, df.dtypes == 'object'].head()
Or if you just want the column names:
df.columns[df.dtypes == 'object']
Upvotes: 1