mortysporty
mortysporty

Reputation: 2889

Casting columns of categories to one string column in Python

This is a follow-up to a previously asked question (asked by me :)) Oneliner to create string column from multiple columns

I want to merge a subset columns in a dataframe to a new create a new string-column. @Zero was kind enough to give me the solution to this problem

import pandas as pd

df = pd.DataFrame({'gender' : ['m', 'f', 'f'],\
                   'code' : ['K2000', 'K2000', 'K2001']})


col_names = df.columns
df_str = df[col_names].astype(str).apply('_'.join, axis=1)
df_str
Out[17]: 
0       K2000_m
1       K2000_f
2       K2001_f
dtype: object

However if I introduce interval data this fails

df = pd.DataFrame({'gender' : ['m', 'f', 'f'],\
                   'code' : ['K2000', 'K2000', 'K2001'],\
                   'num' : pd.cut([3, 6, 9], [0, 5, 10])})
col_names = df.columns
df_str = df[col_names].astype(str).apply('_'.join, axis=1)

Ideally I would also like to transform the data to categorical data (which also fails)

df_cat = pd.concat([df['gender'].astype('category'), \
                    df['code'].astype('category'), \
                    df['num'].astype('category')], axis=1)
df_cat_str = df_cat[col_names].astype(str).apply('_'.join, axis=1)

What is going on here? And how can i acheive the desired output

0   K2000_m_(0, 5]
1  K2000_f_(5, 10]
2  K2001_f_(5, 10]

As with the previous question col_names should be a list containing any subset of the columns (not necessarily all columns as in this example)

Upvotes: 2

Views: 546

Answers (1)

jezrael
jezrael

Reputation: 863791

You need convert each column to str separately in lambda function:

df_str = df[col_names].apply(lambda x: '_'.join(x.astype(str)), axis=1)
print (df_str)
0     K2000_m_(0, 5]
1    K2000_f_(5, 10]
2    K2001_f_(5, 10]
dtype: object

Upvotes: 1

Related Questions