Ashwin Jayarama
Ashwin Jayarama

Reputation: 261

Dataframe with column names derived from column values and cell values by condition

I have to create a result pandas dataframe from a source pandas dataframe having two columns. The result dataframe should have headers of two types, one type should be from the source dataframe derived from one of the column values appending the column header with the values. The other header is taken as it is from the source dataframe with unique values if there are duplicates. The result dataframe cell values should be 1 depending on whether there was a corresponding derived header from the column values or 0 if none.
The dataframes are as below

dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]})  

dfm   
v   x  
44  1  
39  1
39  1
8   1
40  2
8   2
15  2
15  2

result

x v_8 v_15 v_39 v_40 v_44
1  1   0    1     0    1
2  1   1    0     1    0 

Upvotes: 1

Views: 1568

Answers (2)

jezrael
jezrael

Reputation: 862511

You can use function crosstab, then find values higher as 1 and convert it to 1 and 0 by astype:

dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]})  
print dfm
    v  x
0  44  1
1  39  1
2  39  1
3   8  1
4  40  2
5   8  2
6  15  2
7  15  2

df = pd.crosstab(dfm.x, dfm.v)
#rename columns
df.rename(columns=lambda x: 'v_' + str(x), inplace=True)
#reset column name v to ''
df.columns.name = ''
print df
   v_8  v_15  v_39  v_40  v_44
x                             
1    1     0     2     0     1
2    1     2     0     1     0

print (df > 0)
    v_8   v_15   v_39   v_40   v_44
x                                  
1  True  False   True  False   True
2  True   True  False   True  False

print (df > 0).astype(int)
   v_8  v_15  v_39  v_40  v_44
x                             
1    1     0     1     0     1
2    1     1     0     1     0

Upvotes: 2

Jimbo
Jimbo

Reputation: 4515

Could you use something like this. I'm not sure its the best solution so I'm interested to see what others post...

import pandas as pd
dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]}).sort(columns="v")

col_heads = ["v_{}".format(x) for x in dfm.v.unique()]
row_index = dfm.x.unique()

nf = pd.DataFrame(index=row_index, columns=col_heads).fillna(0)
nf.index.name = 'x'

a = dfm.groupby("v")
for i,name in a:
   for val in name.x:
      nf.ix[val, "v_{}".format(i)] = 1

print nf

This outputs the following:

   v_8  v_15  v_39  v_40  v_44
x
1    1     0     1     0     1
2    1     1     0     1     0

Upvotes: 2

Related Questions