Data transformation: Take column value as column name

Question

I have a dataframe like this

df = {'ID': ['A', 'B', 'C'], '2': ['colname1', 'colname2', 'colname1'], '3': [3, 4, 0], '4':['colname3', 'colname3', 'colname3'], '5':[0, 2, 1]}
old = pd.DataFrame(data=df)
old

    ID  2           3       4       5
0   A   colname1    3   colname3    0
1   B   colname2    4   colname3    2
2   C   colname1    0   colname3    1

Where ID A has a value 3 for colname1 and ID B has a value 4 for colname2.

I am trying to clean it so that it looks like

df = {'ID': ['A', 'B', 'C'], 'colname1': [3, 'None', 0], 'colname2': ['None', 4, 'None'], 'colname3':[0, 2, 1]}
new = pd.DataFrame(data=df)
new

    ID  colname1    colname2    colname3
0   A   3           None        0
1   B   None        4           2
2   C   0           None        1

Please note this is a simple example. The actual dataset is a lot larger than this.

My thought was to build another dataframe, extracting all the distinct column names (which appears at the even column) first.

df.iloc[:,1::2].T.apply(lambda x: x.unique(), axis=1)

Then, write a loop to extract the values from the old dataframe to the new dataframe.

But I am not sure how to proceed. Is there a better way of doing this?

jezrael · Accepted Answer

One idea is use lreshape with DataFrame.pivot:

c1 = old.columns[1::2]
c2 = old.columns[2::2]

df = pd.lreshape(old, {'a':c1, 'b':c2}).pivot('ID','a','b')
#alternative if duplicates in `ID`, `a` pairs
#df = pd.lreshape(old, {'a':c1, 'b':c2}).pivot_table(index='ID',columns='a',values='b', aggfunc='mean')
print (df)
a   colname1  colname2  colname3
ID                              
A        3.0       NaN       0.0
B        NaN       4.0       2.0
C        0.0       NaN       1.0

Data transformation: Take column value as column name

Answers (1)

Related Questions