Reputation: 345
I have a dataframe like this
df = {'ID': ['A', 'B', 'C'], '2': ['colname1', 'colname2', 'colname1'], '3': [3, 4, 0], '4':['colname3', 'colname3', 'colname3'], '5':[0, 2, 1]}
old = pd.DataFrame(data=df)
old
ID 2 3 4 5
0 A colname1 3 colname3 0
1 B colname2 4 colname3 2
2 C colname1 0 colname3 1
Where ID A has a value 3 for colname1 and ID B has a value 4 for colname2.
I am trying to clean it so that it looks like
df = {'ID': ['A', 'B', 'C'], 'colname1': [3, 'None', 0], 'colname2': ['None', 4, 'None'], 'colname3':[0, 2, 1]}
new = pd.DataFrame(data=df)
new
ID colname1 colname2 colname3
0 A 3 None 0
1 B None 4 2
2 C 0 None 1
Please note this is a simple example. The actual dataset is a lot larger than this.
My thought was to build another dataframe, extracting all the distinct column names (which appears at the even column) first.
df.iloc[:,1::2].T.apply(lambda x: x.unique(), axis=1)
Then, write a loop to extract the values from the old dataframe to the new dataframe.
But I am not sure how to proceed. Is there a better way of doing this?
Upvotes: 1
Views: 187
Reputation: 863301
One idea is use lreshape
with DataFrame.pivot
:
c1 = old.columns[1::2]
c2 = old.columns[2::2]
df = pd.lreshape(old, {'a':c1, 'b':c2}).pivot('ID','a','b')
#alternative if duplicates in `ID`, `a` pairs
#df = pd.lreshape(old, {'a':c1, 'b':c2}).pivot_table(index='ID',columns='a',values='b', aggfunc='mean')
print (df)
a colname1 colname2 colname3
ID
A 3.0 NaN 0.0
B NaN 4.0 2.0
C 0.0 NaN 1.0
Upvotes: 2