Reputation: 1292
I have a pandas DataFrame which contains information in columns which I would like to extract into a new column.
It is best explained visually:
df = pd.DataFrame({'Number Type 1':[1,2,np.nan],
'Number Type 2':[np.nan,3,4],
'Info':list('abc')})
The Table shows the initial DataFrame with Number Type 1
and NumberType 2
columns.
I would like to extract the types and create a new Type
column, refactoring the DataFrame accordingly.
basically, Numbers are collapsed into the Number
columns, and the types extracted into the Type
column. The information in the Info
column is bound to the numbers (f.e. 2 and 3 have the same information b)
What is the best way to do this in Pandas?
Upvotes: 1
Views: 56
Reputation: 863291
df = df.melt('Info', value_name='Number', var_name='Type').dropna(subset=['Number'])
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
Info Type Number
0 a 1 1
1 b 1 2
4 b 2 3
5 c 2 4
Another solution with set_index
and stack
:
df = df.set_index('Info').stack().rename_axis(('Info','Type')).reset_index(name='Number')
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
Info Type Number
0 a 1 1
1 b 1 2
2 b 2 3
3 c 2 4
Upvotes: 2