Reputation: 666
I have a very large csv file with more than 100k records.
In it, each record is one object with id and about 20~30 properties.
I need to manipulate it so each record is a triplet of id, none empty property, and a value.
I created a sample of a simple data frame to give an example.
Given the following data frame:
data = [{'id': 1, 'shape': 'circle', 'size': 10, 'color':'green'},
{'id': 2, 'shape': 'square', 'color':'pink'},
{'id': 3, 'shape': 'triangle', 'size': 5, 'color': 'black'},
{'id': 4, 'shape': 'pentagon', 'size': 25}]
df = pd.DataFrame(data)
df
Out[10]:
color id shape size
0 green 1 circle 10.0
1 pink 2 square NaN
2 black 3 triangle 5.0
3 NaN 4 pentagon 25.0
Is there an efficient way to get a result? It should look like this:
Out[17]:
id property value
0 1 shape circle
1 1 size 10
2 1 color green
3 2 shape square
4 2 color pink
5 3 shape triangle
6 3 size 5
7 3 color black
8 4 shape pentagon
9 4 size 25
of course, NaN cells should be skipped.
Upvotes: 4
Views: 65
Reputation: 8816
You can simply try as @Jon mentioned df.melt()
calling the melt()
on the dataframe with the id_vars
parameter set & sort the values.
>>> df.melt(id_vars='id', var_name='property').dropna().sort_values('id')
id property value
0 1 color green
4 1 shape circle
8 1 size 10
1 2 color pink
5 2 shape square
2 3 color black
6 3 shape triangle
10 3 size 5
7 4 shape pentagon
11 4 size 25
Read the article it's interesting where it says..
melt
, slightly slower than lreshape
and much faster than wide_to_long
.Upvotes: 2
Reputation: 75080
Using pd.melt()
pd.melt(df,id_vars='id',var_name='Property').dropna().sort_values('id')
id Property value
0 1 color green
4 1 shape circle
8 1 size 10
1 2 color pink
5 2 shape square
2 3 color black
6 3 shape triangle
10 3 size 5
7 4 shape pentagon
11 4 size 25
If order of index is important:
>>pd.melt(df,id_vars='id',var_name='Property').dropna().sort_values('id').reset_index(drop=True)
id Property value
0 1 color green
1 1 shape circle
2 1 size 10
3 2 color pink
4 2 shape square
5 3 color black
6 3 shape triangle
7 3 size 5
8 4 shape pentagon
9 4 size 25
Upvotes: 4