Danish
Danish

Reputation: 2871

convert multi-categorical column into two category in pandas

I have a dataframe as shown below.

df:

ID      tag              
1       pandas
2       numpy
3       matplotlib
4       pandas
5       pandas
6       sns
7       sklearn
8       sklearn
9       pandas
10      pandas

to the above df, I would like to add a column named tag_binary. Which will whether it is pandas or not.

Expected output:

ID      tag            tag_binary         
1       pandas         pandas
2       numpy          non_pandas
3       matplotlib     non_pandas
4       pandas         pandas
5       pandas         pandas
6       sns            non_pandas
7       sklearn        non_pandas
8       sklearn        non_pandas
9       pandas         pandas
10      pandas         pandas

I tried the below code using a dictionary and map function. It worked fine. But I am wondering is there any easier way without creating this complete dictionary.

d = {'pandas':'pandas', 'numpy':'non_pandas', 'matplotlib':'non_pandas',
    'sns':'non_pandas', 'sklearn':'non_pandas'}
df["tag_binary"] = df['tag'].map(d)

Upvotes: 2

Views: 542

Answers (3)

ALollz
ALollz

Reputation: 59579

You can use where with an equality check to keep 'pandas' and fill everything else with 'non_pandas'.

df['tag_binary'] = df['tag'].where(df['tag'].eq('pandas'), 'non_pandas')

   ID         tag    tag_binary
0   1      pandas        pandas
1   2       numpy    non_pandas
2   3  matplotlib    non_pandas
3   4      pandas        pandas
4   5      pandas        pandas
5   6         sns    non_pandas
6   7     sklearn    non_pandas
7   8     sklearn    non_pandas
8   9      pandas        pandas
9  10      pandas        pandas

If you want something a little more flexible, so you can also map specific values to some label, then you can leverage the fact that for keys not in your dict, map returns NaN. So only specify mappings you care about and then fillna to deal with every other case.

# Could be more general like {'pandas': 'pandas', 'geopandas': 'pandas'}
d = {'pandas': 'pandas'} 
df['pandas_binary'] = df['tag'].map(d).fillna('non_pandas')

Upvotes: 4

Henry Ecker
Henry Ecker

Reputation: 35696

If specifically needing "Categorical Data", to assign some ordering hierarchy, ensuring that only these values are permitted in the column, or simply reducing the amount of space, we can create a CategoricalDtype make the conversion with astype then fillna to fill the NaN values introduced when converting values that are not contained within the Categorical:

cat_dtype = pd.CategoricalDtype(['pandas', 'non_pandas'])
df['tag_binary'] = df['tag'].astype(cat_dtype).fillna('non_pandas')

df:

   ID         tag  tag_binary
0   1      pandas      pandas
1   2       numpy  non_pandas
2   3  matplotlib  non_pandas
3   4      pandas      pandas
4   5      pandas      pandas
5   6         sns  non_pandas
6   7     sklearn  non_pandas
7   8     sklearn  non_pandas
8   9      pandas      pandas
9  10      pandas      pandas

Setup Used:

import pandas as pd

df = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'tag': ['pandas', 'numpy', 'matplotlib', 'pandas', 'pandas', 'sns',
            'sklearn', 'sklearn', 'pandas', 'pandas']
})

Upvotes: 3

gal peled
gal peled

Reputation: 482

you can use apply

def is_pandas(name):
    if name == 'pandas':
        return 'pandas'#or True
    return 'non_pandas' # or Fales 

df['tag_binary'] = df['tag'].apply(lambda x: is_pandas(x))

Upvotes: 3

Related Questions