Reputation: 261
I have to create a result pandas dataframe from a source pandas dataframe having two columns. The result dataframe should have headers of two types, one type should be from the source dataframe derived from one of the column values appending the column header with the values. The other header is taken as it is from the source dataframe with unique values if there are duplicates. The result dataframe cell values should be 1 depending on whether there was a corresponding derived header from the column values or 0 if none.
The dataframes are as below
dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]})
dfm
v x
44 1
39 1
39 1
8 1
40 2
8 2
15 2
15 2
result
x v_8 v_15 v_39 v_40 v_44
1 1 0 1 0 1
2 1 1 0 1 0
Upvotes: 1
Views: 1568
Reputation: 862511
You can use function crosstab
, then find values higher as 1
and convert it to 1
and 0
by astype
:
dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]})
print dfm
v x
0 44 1
1 39 1
2 39 1
3 8 1
4 40 2
5 8 2
6 15 2
7 15 2
df = pd.crosstab(dfm.x, dfm.v)
#rename columns
df.rename(columns=lambda x: 'v_' + str(x), inplace=True)
#reset column name v to ''
df.columns.name = ''
print df
v_8 v_15 v_39 v_40 v_44
x
1 1 0 2 0 1
2 1 2 0 1 0
print (df > 0)
v_8 v_15 v_39 v_40 v_44
x
1 True False True False True
2 True True False True False
print (df > 0).astype(int)
v_8 v_15 v_39 v_40 v_44
x
1 1 0 1 0 1
2 1 1 0 1 0
Upvotes: 2
Reputation: 4515
Could you use something like this. I'm not sure its the best solution so I'm interested to see what others post...
import pandas as pd
dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]}).sort(columns="v")
col_heads = ["v_{}".format(x) for x in dfm.v.unique()]
row_index = dfm.x.unique()
nf = pd.DataFrame(index=row_index, columns=col_heads).fillna(0)
nf.index.name = 'x'
a = dfm.groupby("v")
for i,name in a:
for val in name.x:
nf.ix[val, "v_{}".format(i)] = 1
print nf
This outputs the following:
v_8 v_15 v_39 v_40 v_44
x
1 1 0 1 0 1
2 1 1 0 1 0
Upvotes: 2