Reputation: 331
I have a comma separated file that I am reading with Pandas via Python. Here is a subset:
Location Time Value1 Value2
CCNY 2013-01-01 00:00:00 59 12.71
CCNY 2013-01-01 01:00:00 96 10.6
CCNY 2013-01-01 02:00:00 105 11.94
CCNY 2013-01-01 03:00:00 81 11.73
CCNY 2013-01-01 04:00:00 60 13.05
CCNY 2013-01-01 05:00:00 51 13.25
...
CCNY 2013-31-01 06:00:00 28 13.03
I need to plot value1 (x-axis) vs value2 (y-axis), but I need to do it for each day. So, for this portion of the file which contains values for the entire month of January, there will be 31 plots.
How should I go about this?
(The ultimate goal is to get best fit lines in each plot and r squared values.)
Thanks.
Upvotes: 2
Views: 76
Reputation: 393893
Your csv looks like a fixed width file so I would use read_fwf
, you then need to rename the time column as it treats the time portion as an unnamed column, you can then use @chrisB's answer to achieve what you want:
In [35]:
t="""Location Time Value1 Value2
CCNY 2013-01-01 00:00:00 59 12.71
CCNY 2013-01-01 01:00:00 96 10.6
CCNY 2013-01-01 02:00:00 105 11.94
CCNY 2013-01-02 03:00:00 81 11.73
CCNY 2013-01-02 04:00:00 60 13.05
CCNY 2013-01-02 05:00:00 51 13.25"""
df = pd.read_fwf(io.StringIO(t), parse_dates=[[1,2]])
df.rename(columns={'Time_Unnamed: 2':'Time'},inplace=True)
df
Out[35]:
Time Location Value1 Value2
0 2013-01-01 00:00:00 CCNY 59 12.71
1 2013-01-01 01:00:00 CCNY 96 10.60
2 2013-01-01 02:00:00 CCNY 105 11.94
3 2013-01-02 03:00:00 CCNY 81 11.73
4 2013-01-02 04:00:00 CCNY 60 13.05
5 2013-01-02 05:00:00 CCNY 51 13.25
In [36]:
df.groupby(df['Time'].dt.date).plot(x='Value1', y='Value2')
Out[36]:
2013-01-01 Axes(0.125,0.125;0.775x0.775)
2013-01-02 Axes(0.125,0.125;0.775x0.775)
dtype: object
Results in the plots:
and
Upvotes: 1
Reputation: 52236
See below - this groups the data by day and produce a plot for each.
df.groupby(df['Time'].dt.day).plot(x='Value1', y='Value2')
Upvotes: 3