Reputation: 3897
I have a DataFrame where rows represent time and columns represent individuals. I want to turn it into into long panel data format in pandas in an efficient manner, as the DataFames are rather large. I would like to avoid looping. Here is an example: The following DataFrame:
id 1 2
date
20150520 3.0 4.0
20150521 5.0 6.0
should be transformed into:
date id value
20150520 1 3.0
20150520 2 4.0
20150520 1 5.0
20150520 2 6.0
Speed is what's really important to me, due to the data size. I prefer it over elegance if there is a tradeoff. Although I suspect I mam missing a rather simple function, pandas should be able to handle that. Any suggestions?
Upvotes: 3
Views: 1986
Reputation: 294218
using melt
pd.melt(df.reset_index(),
id_vars='date',
value_vars=['1', '2'],
var_name='Id')
EDIT:
Because OP wants fast ;-)
def pir(df):
dv = df.values
iv = df.index.values
cv = df.columns.values
rc, cc = df.shape
return pd.DataFrame(
dict(value=dv.flatten(),
id=np.tile(cv, rc)),
np.repeat(iv, cc))
Upvotes: 2
Reputation: 862511
I think you need stack
with reset_index
:
print (df)
1 2
date
20150520 3.0 4.0
20150521 5.0 6.0
df = df.stack().reset_index()
df.columns = ['date','id','value']
print (df)
date id value
0 20150520 1 3.0
1 20150520 2 4.0
2 20150521 1 5.0
3 20150521 2 6.0
print (df)
id 1 2
date
20150520 3.0 4.0
20150521 5.0 6.0
df = df.stack().reset_index(name='value')
print (df)
date id value
0 20150520 1 3.0
1 20150520 2 4.0
2 20150521 1 5.0
3 20150521 2 6.0
Upvotes: 3
Reputation: 418
the function you are looking for is
df.reset_index()
you can then rename your columns using
df.columns = ['date', 'id', 'value']
Upvotes: 1