Reputation: 1387
I have data like this:
user reg ind prod
A Asia Tele TV
A Asia Bank Phone
A Japan Tele Book
B US Fin Paper
B US Data Shop
B Asia Tele TV
B Africa Invest Book
C Asia Tele Paper
C Japan Fin TV
I want to convert the categories in each columns into individual columns and mark them as 1/0 if they there under the user, something like this:
User Asia Japan US Africa Tele Bank Fin Data Invest TV Phone Book Paper Shop
A 1 1 0 0 1 1 0 0 0 1 1 1 0 0
B 1 0 1 1 1 0 1 1 1 1 0 1 1 1
C 1 1 0 0 1 0 1 0 0 1 0 0 1 0
Essentially coding them to mark their association with each category. Also, if i can rename the columns such that it has the main column in it such as "reg_Asia" etc. I tried using Pivot, Pivot_table, Stack, Unstack but was not able to do it.
Upvotes: 2
Views: 310
Reputation: 863301
Use get_dummies
with set_index
and max
if want only 0/1
output or sum
if want count values:
df = pd.get_dummies(df.set_index('user'), prefix='', prefix_sep='').max(level=0).reset_index()
print (df)
user Africa Asia Japan US Bank Data Fin Invest Tele Book Paper \
0 A 0 1 1 0 1 0 0 0 1 1 0
1 B 1 1 0 1 0 1 1 1 1 1 1
2 C 0 1 1 0 0 0 1 0 1 0 1
Phone Shop TV
0 1 0 1
1 0 1 1
2 0 0 1
If want also categorize data is possible create MultiIndex
:
df1= pd.get_dummies(df.set_index('user')).max(level=0)
df1.columns = df1.columns.str.split('_', expand=True)
print (df1)
reg ind prod
Africa Asia Japan US Bank Data Fin Invest Tele Book Paper Phone Shop TV
user
A 0 1 1 0 1 0 0 0 1 1 0 1 0 1
B 1 1 0 1 0 1 1 1 1 1 1 0 1 1
C 0 1 1 0 0 0 1 0 1 0 1 0 0 1
print (df1['reg'])
Africa Asia Japan US
user
A 0 1 1 0
B 1 1 0 1
C 0 1 1 0
Upvotes: 4