Reputation: 2871
I have a dataframe as shown below.
df:
ID tag
1 pandas
2 numpy
3 matplotlib
4 pandas
5 pandas
6 sns
7 sklearn
8 sklearn
9 pandas
10 pandas
to the above df, I would like to add a column named tag_binary. Which will whether it is pandas or not.
Expected output:
ID tag tag_binary
1 pandas pandas
2 numpy non_pandas
3 matplotlib non_pandas
4 pandas pandas
5 pandas pandas
6 sns non_pandas
7 sklearn non_pandas
8 sklearn non_pandas
9 pandas pandas
10 pandas pandas
I tried the below code using a dictionary and map function. It worked fine. But I am wondering is there any easier way without creating this complete dictionary.
d = {'pandas':'pandas', 'numpy':'non_pandas', 'matplotlib':'non_pandas',
'sns':'non_pandas', 'sklearn':'non_pandas'}
df["tag_binary"] = df['tag'].map(d)
Upvotes: 2
Views: 542
Reputation: 59579
You can use where
with an equality check to keep 'pandas'
and fill everything else with 'non_pandas'
.
df['tag_binary'] = df['tag'].where(df['tag'].eq('pandas'), 'non_pandas')
ID tag tag_binary
0 1 pandas pandas
1 2 numpy non_pandas
2 3 matplotlib non_pandas
3 4 pandas pandas
4 5 pandas pandas
5 6 sns non_pandas
6 7 sklearn non_pandas
7 8 sklearn non_pandas
8 9 pandas pandas
9 10 pandas pandas
If you want something a little more flexible, so you can also map specific values to some label, then you can leverage the fact that for keys not in your dict
, map
returns NaN
. So only specify mappings you care about and then fillna
to deal with every other case.
# Could be more general like {'pandas': 'pandas', 'geopandas': 'pandas'}
d = {'pandas': 'pandas'}
df['pandas_binary'] = df['tag'].map(d).fillna('non_pandas')
Upvotes: 4
Reputation: 35696
If specifically needing "Categorical Data", to assign some ordering hierarchy, ensuring that only these values are permitted in the column, or simply reducing the amount of space, we can create a CategoricalDtype
make the conversion with astype
then fillna
to fill the NaN
values introduced when converting values that are not contained within the Categorical:
cat_dtype = pd.CategoricalDtype(['pandas', 'non_pandas'])
df['tag_binary'] = df['tag'].astype(cat_dtype).fillna('non_pandas')
df
:
ID tag tag_binary
0 1 pandas pandas
1 2 numpy non_pandas
2 3 matplotlib non_pandas
3 4 pandas pandas
4 5 pandas pandas
5 6 sns non_pandas
6 7 sklearn non_pandas
7 8 sklearn non_pandas
8 9 pandas pandas
9 10 pandas pandas
Setup Used:
import pandas as pd
df = pd.DataFrame({
'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'tag': ['pandas', 'numpy', 'matplotlib', 'pandas', 'pandas', 'sns',
'sklearn', 'sklearn', 'pandas', 'pandas']
})
Upvotes: 3
Reputation: 482
you can use apply
def is_pandas(name):
if name == 'pandas':
return 'pandas'#or True
return 'non_pandas' # or Fales
df['tag_binary'] = df['tag'].apply(lambda x: is_pandas(x))
Upvotes: 3