Dataframe with column names derived from column values and cell values by condition

Question

I have to create a result pandas dataframe from a source pandas dataframe having two columns. The result dataframe should have headers of two types, one type should be from the source dataframe derived from one of the column values appending the column header with the values. The other header is taken as it is from the source dataframe with unique values if there are duplicates. The result dataframe cell values should be 1 depending on whether there was a corresponding derived header from the column values or 0 if none.
The dataframes are as below

dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]})  

dfm   
v   x  
44  1  
39  1
39  1
8   1
40  2
8   2
15  2
15  2

result

x v_8 v_15 v_39 v_40 v_44
1  1   0    1     0    1
2  1   1    0     1    0

jezrael · Accepted Answer

You can use function crosstab, then find values higher as 1 and convert it to 1 and 0 by astype:

dfm = pd.DataFrame({'v' : [44,39,39,8,40,8,15,15],'x':[1,1,1,1,2,2,2,2]})  
print dfm
    v  x
0  44  1
1  39  1
2  39  1
3   8  1
4  40  2
5   8  2
6  15  2
7  15  2

df = pd.crosstab(dfm.x, dfm.v)
#rename columns
df.rename(columns=lambda x: 'v_' + str(x), inplace=True)
#reset column name v to ''
df.columns.name = ''
print df
   v_8  v_15  v_39  v_40  v_44
x                             
1    1     0     2     0     1
2    1     2     0     1     0

print (df > 0)
    v_8   v_15   v_39   v_40   v_44
x                                  
1  True  False   True  False   True
2  True   True  False   True  False

print (df > 0).astype(int)
   v_8  v_15  v_39  v_40  v_44
x                             
1    1     0     1     0     1
2    1     1     0     1     0

Dataframe with column names derived from column values and cell values by condition

Answers (2)

Related Questions