Reputation: 3918
I have a csv file which contains date and time stamps as two of the columns. I am using pandas read_csv
to read the contents into a dataframe. My ultimate goal is to plot time series graphs from the data.
!head vmstat.csv
wait_proc,sleep_proc,swapped_memory,free_memory,buffered_memory,cached_memory,swapped_in,swapped_out,received_block,sent_block,interrups,context_switches,user_time,sys_time,idle_time,wait_io_time,stolen_time,date,time
0,0,10896,3776872,380028,10284052,0,0,6,16,7716,4755,3,1,96,0,0,2012-11-01,08:59:27
0,0,10896,3776500,380028,10284208,0,0,0,40,7471,4620,0,0,99,0,0,2012-11-01,08:59:32
0,0,10896,3749840,380028,10286864,0,0,339,19,7479,4704,20,2,77,1,0,2012-11-01,08:59:37
0,0,10896,3747536,380028,10286964,0,0,17,118,7488,4638,0,0,99,0,0,2012-11-01,08:59:42
0,0,10896,3747452,380028,10287148,0,0,0,24,7489,4676,0,0,99,0,0,2012-11-01,08:59:47
df = read_csv("vmstat.csv", parse_dates=[['date','time']])
f = DataFrame(df, columns=[ 'date_time', 'user_time', 'sys_time', 'wait_io_time'])
In [3]: f
Out[3]:
date_time user_time sys_time wait_io_time
0 2012-11-01 08:59:27 3 1 0
1 2012-11-01 08:59:32 0 0 0
2 2012-11-01 08:59:37 20 2 1
3 2012-11-01 08:59:42 0 0 0
4 2012-11-01 08:59:47 0 0 0
So far, we could read the data correctly and date_time
is combined in the DataFrame. There are issues if I try to used the date_time
from df
as index. Specifying index = df.date_time
gives all NaN
values:
dindex = f['date_time']
print dindex
g = DataFrame(f, columns=[ 'user_time', 'sys_time', 'wait_io_time'], index=dindex)
In [7]: g
Out[7]:
0 2012-11-01 08:59:27
1 2012-11-01 08:59:32
2 2012-11-01 08:59:37
3 2012-11-01 08:59:42
4 2012-11-01 08:59:47
Name: date_time <---- dindex
g:
user_time sys_time wait_io_time
date_time
2012-11-01 08:59:27 NaN NaN NaN
2012-11-01 08:59:32 NaN NaN NaN
2012-11-01 08:59:37 NaN NaN NaN
2012-11-01 08:59:42 NaN NaN NaN
2012-11-01 08:59:47 NaN NaN NaN
As you see, the column values are coming out as all NaN
s. How do I get correct values as in the intermediate f
frame?
Upvotes: 2
Views: 3699
Reputation: 375605
You want to use set_index
:
df1 = df.set_index('date_time')
Which selects the column 'date_time'
as an index for the new DataFrame.
.
Note: The behaviour you are coming across in the DataFrame constructor is demonstrated as follows:
df = pd.DataFrame([[1,2],[3,4]])
df1 = pd.DataFrame(df, index=[1,2])
In [3]: df1
Out[3]:
0 1
1 3 4
2 NaN NaN
Upvotes: 3
Reputation: 3918
I could get a workaround by the following code:
up = f.pivot_table('user_time', rows='date_time')
sp = f.pivot_table('sys_time', rows='date_time')
wp = f.pivot_table('wait_io_time', rows='date_time')
u=pandas.DataFrame(up)
u['sys_time']=sp
u['wait_io_time']=wp
my_colors = ["#FF6666", "#00CC33", "#44EEEE"]
print u
Out:
user_time sys_time wait_io_time
date_time
2012-11-01 08:59:27 3 1 0
2012-11-01 08:59:32 0 0 0
2012-11-01 08:59:37 20 2 1
2012-11-01 08:59:42 0 0 0
2012-11-01 08:59:47 0 0 0
There should be more straightforward ways to achieve this, but I am newB in pandas.
Moreover, the u.plot() functions fails in plotting a time series graph. "AttributeError: 'numpy.int64' object has no attribute 'ordinal'" So waiting to hear from others for a better solution.
Upvotes: 0