Reputation: 25
I want to plot and compare sea levels from a large textfile which contains 3 types of sensors (enc,rad,prs). These sensors measured at close but different and irregular timestamps. The raw text file is like this
sensorid sensor stime slevel
100 rad 2007-11-24 18:47:00 8.0710001
100 rad 2007-11-24 18:49:00 8.066
100 rad 2007-11-24 18:51:00 8.0600004
100 rad 2007-11-24 18:53:00 8.0500002
101 enc 2007-11-24 18:55:00 6.0079999
100 rad 2007-11-24 18:55:00 8.0389996
99 prs 2007-11-24 18:56:00 7.7600002
99 prs 2007-11-24 18:57:00 7.7529998
I can easily put this file into a df DataFrame which have the time as index.
df=pd.read_table('data.txt',index_col='stime',parse_dates=['stime'])
sensorid sensor slevel
stime
2007-11-24 18:47:00 100 rad 8.071000
2007-11-24 18:49:00 100 rad 8.066000
2007-11-24 18:51:00 100 rad 8.060000
2007-11-24 18:53:00 100 rad 8.050000
2007-11-24 18:55:00 101 enc 6.008000
2007-11-24 18:55:00 100 rad 8.039000
2007-11-24 18:56:00 99 prs 7.760000
2007-11-24 18:57:00 99 prs 7.753000
2007-11-24 18:40:00 101 enc 6.066000
But I want a DataFrame with "enc", "rad" and "prs" as the column (and the sensorid) and the sea levels as the values with NaN when the sensor didn't measure for this timestamp.
I'm looking for a clever trick to do this rearranging in a single line ... a good test for oneliners.
Upvotes: 2
Views: 65
Reputation: 58
This is a case of pivoting the df. Try something like this:
df.pivot(index='stime', columns='sensors', values='slevel')
See for further info: http://pandas.pydata.org/pandas-docs/stable/reshaping.html
Upvotes: 4