user3142067
user3142067

Reputation: 1292

Pandas DataFrames: Extract Information and Collapse Columns

I have a pandas DataFrame which contains information in columns which I would like to extract into a new column.

It is best explained visually:

df = pd.DataFrame({'Number Type 1':[1,2,np.nan],
                   'Number Type 2':[np.nan,3,4],
                   'Info':list('abc')})

Initial DataFrame

The Table shows the initial DataFrame with Number Type 1 and NumberType 2 columns. I would like to extract the types and create a new Type column, refactoring the DataFrame accordingly.

Refactored DataFrame

basically, Numbers are collapsed into the Number columns, and the types extracted into the Type column. The information in the Info column is bound to the numbers (f.e. 2 and 3 have the same information b)

What is the best way to do this in Pandas?

Upvotes: 1

Views: 56

Answers (1)

jezrael
jezrael

Reputation: 863291

Use melt with dropna:

df = df.melt('Info', value_name='Number', var_name='Type').dropna(subset=['Number'])
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
  Info Type  Number
0    a    1       1
1    b    1       2
4    b    2       3
5    c    2       4

Another solution with set_index and stack:

df = df.set_index('Info').stack().rename_axis(('Info','Type')).reset_index(name='Number')

df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
  Info Type  Number
0    a    1       1
1    b    1       2
2    b    2       3
3    c    2       4

Upvotes: 2

Related Questions