Pandas checking if a column is category issue

Question

Im trying to loop over my columns and act differently if the column is category than if its something else.

Using the following method works for a series that is category but give an error when checking a series with object dtype.

if series.dtype == 'category':
    # do something

Works on category, but if the dtype is object throws:

Error:

Traceback (most recent call last):
  File "", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "", line 54, in run_data_template_task
    data_template.run(data_bundle, columns=columns)
  File "", line 531, in run
    self.to_parquet(data_bundle, columns=columns)
  File "", line 195, in to_parquet
    df = self.parse_df(df, columns=columns, overwrite_columns=overwrite_columns)
  File "", line 378, in parse_df
    df[col.name] = parse_series_with_nans(df[col.name], 'str')
  File "", line 369, in parse_series_with_nans
    if series.dtype == 'category':
TypeError: data type "category" not understood

On the other hand, Using:

if series.dtype is 'category':
    # do something

returns False even when the dtype is a category (which makes sense because its obviously not the same object)

a reproduce-able example:

         df = pd.DataFrame({'category_column': ['a', 'b', 'c'], 'other_column': [1, 2, 3]})
         df['category_column'] = df['category_column'].astype('category')
         df['category_column'].dtype is 'category'
Out[46]: False
         df['category_column'].dtype == 'category'
Out[47]: True
         df['other_column'].dtype == 'category'
Traceback (most recent call last):
  File "", line 3296, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in 
    d['other_column'].dtype == 'category'
TypeError: data type "category" not understood

user2314737 · Accepted Answer

df['category_column'].dtype is 'category'

is false because the two objects are not the same object.

On the other hand,

df['category_column'].dtype == 'category'

because

All instances of CategoricalDtype compare equal to the string 'category'.

(https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html#equality-semantics)

See also Understanding Python's "is" operator

Pandas checking if a column is category issue

Answers (2)

Related Questions