Reputation: 95
I have a pandas dataframe that has one row per object. Within that object, there are subobjects. I want to create a dataframe which contains one row per subobject.
I've read stuff on melt but can't begin to work out how to use it for what I want to do.
I want to go from
ObjectID Sub1_ID Sub1_Var1 Sub1_Var2 Sub1_Var3 Sub2_ID Sub2_Var1 Sub2_Var2 Sub2_Var3
1 98398 3 10 9 19231 6 7 5
2 87868 8 5 4
3 4579 5 6 6 24833 6 2 2
4 2514 1 6 9
to
ObjectID Sub_ID Var1 Var2 Var3
1 98398 3 10 9
1 19231 6 7 5
2 87868 8 5 4
3 4579 5 6 6
3 24833 6 2 2
4 2514 1 6 9
Upvotes: 2
Views: 2085
Reputation: 153460
One way you can do this is using MultiIndex with from_arrays
and then use stack
to reshape the dataframe:
df1 = df.set_index('ObjectID')
df1.columns = pd.MultiIndex.from_arrays(zip(*df1.columns.str.split('_')))
df1.stack(0).reset_index().drop('level_1', axis=1)
Output:
ObjectID ID Var1 Var2 Var3
0 1 98398.0 3.0 10.0 9.0
1 1 19231.0 6.0 7.0 5.0
2 2 87868.0 8.0 5.0 4.0
3 3 4579.0 5.0 6.0 6.0
4 3 24833.0 6.0 2.0 2.0
5 4 2514.0 1.0 6.0 9.0
Upvotes: 10
Reputation: 42886
Another way would be:
Sub_ID
column to two different dataframessplitting
them on _
axis=0
(on top of each other)d1 = df.set_index('ObjectID').loc[:, 'Sub1_ID':'Sub1_Var3']
d2 = df.set_index('ObjectID').loc[:, 'Sub2_ID':]
d1.columns = d1.columns.str.split('_').str[1]
d2.columns = d2.columns.str.split('_').str[1]
pd.concat([d1, d2[d2['ID'].ne(0)]]).sort_index().reset_index()
ObjectID ID Var1 Var2 Var3
0 1 98398 3 10 9
1 1 19231 6 7 5
2 2 87868 8 5 4
3 3 4579 5 6 6
4 3 24833 6 2 2
5 4 2514 1 6 9
Upvotes: 0
Reputation: 148890
You could concat the sub dataframes:
def ren_col(x):
if x.startswith('Obj'): return x
elif x.endswith('ID'): return 'Sub_ID'
else: return x[-4:]
cols = df.columns
resul = pd.concat([df[sub].rename(ren_col, axis=1).dropna()
for sub in [[cols[0]] + cols[i:i+4].tolist()
for i in range(1, len(cols), 4)]]
).sort_values(['ObjectID', 'Sub_ID']).reset_index(drop=True)
It gives as expected:
ObjectID Sub_ID Var1 Var2 Var3
0 1 19231.0 6.0 7.0 5.0
1 1 98398.0 3.0 10.0 9.0
2 2 87868.0 8.0 5.0 4.0
3 3 4579.0 5.0 6.0 6.0
4 3 24833.0 6.0 2.0 2.0
5 4 2514.0 1.0 6.0 9.0
Upvotes: 0