Richard Kapustynskyj
Richard Kapustynskyj

Reputation: 95

How to I convert a pandas dataframe row into multiple rows

I have a pandas dataframe that has one row per object. Within that object, there are subobjects. I want to create a dataframe which contains one row per subobject.

I've read stuff on melt but can't begin to work out how to use it for what I want to do.

I want to go from

ObjectID    Sub1_ID Sub1_Var1   Sub1_Var2   Sub1_Var3   Sub2_ID Sub2_Var1   Sub2_Var2   Sub2_Var3
1           98398   3           10          9           19231           6           7           5
2           87868   8           5           4               
3           4579    5           6           6           24833           6           2           2
4           2514    1           6           9   

to

ObjectID    Sub_ID  Var1    Var2    Var3
1           98398   3       10      9
1           19231   6       7       5
2           87868   8       5       4
3           4579    5       6       6
3           24833   6       2       2
4           2514    1       6       9

Upvotes: 2

Views: 2085

Answers (3)

Scott Boston
Scott Boston

Reputation: 153460

One way you can do this is using MultiIndex with from_arrays and then use stack to reshape the dataframe:

df1 = df.set_index('ObjectID')

df1.columns = pd.MultiIndex.from_arrays(zip(*df1.columns.str.split('_')))

df1.stack(0).reset_index().drop('level_1', axis=1)

Output:

   ObjectID       ID  Var1  Var2  Var3
0         1  98398.0   3.0  10.0   9.0
1         1  19231.0   6.0   7.0   5.0
2         2  87868.0   8.0   5.0   4.0
3         3   4579.0   5.0   6.0   6.0
4         3  24833.0   6.0   2.0   2.0
5         4   2514.0   1.0   6.0   9.0

Upvotes: 10

Erfan
Erfan

Reputation: 42886

Another way would be:

  1. Split your dataframe by the Sub_ID column to two different dataframes
  2. Rename the columns in a general way by splitting them on _
  3. Concat the dataframes over axis=0 (on top of each other)
d1 = df.set_index('ObjectID').loc[:, 'Sub1_ID':'Sub1_Var3']
d2 = df.set_index('ObjectID').loc[:, 'Sub2_ID':]

d1.columns = d1.columns.str.split('_').str[1]
d2.columns = d2.columns.str.split('_').str[1]

pd.concat([d1, d2[d2['ID'].ne(0)]]).sort_index().reset_index()

   ObjectID     ID  Var1  Var2  Var3
0         1  98398     3    10     9
1         1  19231     6     7     5
2         2  87868     8     5     4
3         3   4579     5     6     6
4         3  24833     6     2     2
5         4   2514     1     6     9

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 148890

You could concat the sub dataframes:

def ren_col(x):
    if x.startswith('Obj'): return x
    elif x.endswith('ID'): return 'Sub_ID'
    else: return x[-4:]

cols = df.columns    
resul = pd.concat([df[sub].rename(ren_col, axis=1).dropna()
                   for sub in [[cols[0]] + cols[i:i+4].tolist()
                               for i in range(1, len(cols), 4)]]
                  ).sort_values(['ObjectID', 'Sub_ID']).reset_index(drop=True)

It gives as expected:

   ObjectID   Sub_ID  Var1  Var2  Var3
0         1  19231.0   6.0   7.0   5.0
1         1  98398.0   3.0  10.0   9.0
2         2  87868.0   8.0   5.0   4.0
3         3   4579.0   5.0   6.0   6.0
4         3  24833.0   6.0   2.0   2.0
5         4   2514.0   1.0   6.0   9.0

Upvotes: 0

Related Questions