dataframe structure manipulation

Question

I have a very large csv file with more than 100k records.
In it, each record is one object with id and about 20~30 properties.
I need to manipulate it so each record is a triplet of id, none empty property, and a value.
I created a sample of a simple data frame to give an example.
Given the following data frame:

data = [{'id': 1, 'shape': 'circle', 'size': 10, 'color':'green'},
        {'id': 2, 'shape': 'square', 'color':'pink'},
        {'id': 3, 'shape': 'triangle', 'size': 5, 'color': 'black'},
        {'id': 4, 'shape': 'pentagon', 'size': 25}]
df = pd.DataFrame(data)
df
Out[10]: 
   color  id     shape  size
0  green   1    circle  10.0
1   pink   2    square   NaN
2  black   3  triangle   5.0
3    NaN   4  pentagon  25.0

Is there an efficient way to get a result? It should look like this:

Out[17]: 
   id property     value
0   1   shape    circle
1   1    size        10
2   1   color     green
3   2   shape    square
4   2   color      pink
5   3   shape  triangle
6   3    size         5
7   3   color     black
8   4   shape  pentagon
9   4    size        25

of course, NaN cells should be skipped.

anky · Accepted Answer

Using pd.melt()

pd.melt(df,id_vars='id',var_name='Property').dropna().sort_values('id')
    id Property     value
0    1    color     green
4    1    shape    circle
8    1     size        10
1    2    color      pink
5    2    shape    square
2    3    color     black
6    3    shape  triangle
10   3     size         5
7    4    shape  pentagon
11   4     size        25

If order of index is important:

>>pd.melt(df,id_vars='id',var_name='Property').dropna().sort_values('id').reset_index(drop=True)

   id Property     value
0   1    color     green
1   1    shape    circle
2   1     size        10
3   2    color      pink
4   2    shape    square
5   3    color     black
6   3    shape  triangle
7   3     size         5
8   4    shape  pentagon
9   4     size        25

dataframe structure manipulation

Answers (2)

Related Questions