Reputation: 81
I had a text file where I had few lines which I wanted to read in as pandas dataframe. Here are the few lines which I copied from the text file and saved into another text file
MTU, Time, Power, Cost, Voltage
MTU1,05/11/2015 19:59:06,4.102,0.62,122.4
MTU1,05/11/2015 19:59:05,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.089,0.62,122.3
MTU1,05/11/2015 19:59:06,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.097,0.62,122.4
MTU1,05/11/2015 19:59:03,4.097,0.62,122.4
MTU1,05/11/2015 19:59:02,4.111,0.62,122.5
MTU1,05/11/2015 19:59:03,4.111,0.62,122.5
MTU1,05/11/2015 19:59:02,4.104,0.62,122.5
MTU1,05/11/2015 19:59:01,4.090,0.62,122.4
MTU1,05/11/2015 19:59:00,4.093,0.62,122.4
MTU1,05/11/2015 19:58:59,4.112,0.62,122.5
MTU1,05/11/2015 19:58:58,4.107,0.62,122.6
MTU1,05/11/2015 19:58:57,4.092,0.62,122.7
Now, when I read in the text file using the following.
energy=pd.read_csv("energy.txt",sep=",")
# Reading in first 5 rows of data.
energy.head()
Out[65]:
I get this:
MTU Time Power Cost Voltage
0 MTU1 05/11/15 19:59 4.102 0.62 122.4
1 MTU1 05/11/15 19:59 4.089 0.62 122.3
2 MTU1 05/11/15 19:59 4.089 0.62 122.3
3 MTU1 05/11/15 19:59 4.089 0.62 122.3
4 MTU1 05/11/15 19:59 4.097 0.62 122.4
The problem is I guess the columns are still in the form of string. I converted them to numeric by using the following.
energy=energy.convert_objects(convert_numeric=True)
But when I try to plot power variable with time to see the trend by time,I get an error
energy.plot(energy.time,energy.power)
if isinstance(obj, tuple) and is_setter:
1142 return {'key': obj}
-> 1143 raise KeyError('%s not in index' % objarr[mask])
1144
1145 return _values_from_object(indexer)
KeyError: '[ 4.102 4.089 4.089 4.089 4.097 4.097 4.111 4.111 4.104 4.09\n 4.093 4.112 4.107 4.092 4.092 4.109 4.107 4.107 4.092 4.092\n 4.092 4.107 4.109 4.094 4.09 4.103 4.103 4.103 4.11 4.096\n 4.122 4.156 4.154 4.154 4.144 4.15 4.16 4.16 4.163 4.163\n 4.154 4.15 4.157 4.167 4.16 4.149 4.153 4.165 4.166 4.155\n 4.151 4.164 4.172 4.161 4.152 4.16
I guess its because the power variable still has "\n" appended to some values. How do I rectify this error.
Upvotes: 0
Views: 107
Reputation: 16134
I'm on pandas 0.16 and it seems to be working fine for me. The column names do have a whitespace in the beginning of their names, though -
In [48]: energy
Out[48]:
MTU Time Power Cost Voltage
0 MTU1 05/11/2015 19:59:06 4.102 0.62 122.4
1 MTU1 05/11/2015 19:59:05 4.089 0.62 122.3
2 MTU1 05/11/2015 19:59:04 4.089 0.62 122.3
3 MTU1 05/11/2015 19:59:06 4.089 0.62 122.3
4 MTU1 05/11/2015 19:59:04 4.097 0.62 122.4
5 MTU1 05/11/2015 19:59:03 4.097 0.62 122.4
6 MTU1 05/11/2015 19:59:02 4.111 0.62 122.5
7 MTU1 05/11/2015 19:59:03 4.111 0.62 122.5
8 MTU1 05/11/2015 19:59:02 4.104 0.62 122.5
9 MTU1 05/11/2015 19:59:01 4.090 0.62 122.4
10 MTU1 05/11/2015 19:59:00 4.093 0.62 122.4
11 MTU1 05/11/2015 19:58:59 4.112 0.62 122.5
12 MTU1 05/11/2015 19:58:58 4.107 0.62 122.6
13 MTU1 05/11/2015 19:58:57 4.092 0.62 122.7
In [49]: energy.columns
Out[49]: Index([u'MTU', u' Time', u' Power', u' Cost', u' Voltage'], dtype='object')
In [50]: energy.plot(x=' Time', y=' Power') # or energy.plot(' Time', ' Voltage')
Out[50]: <matplotlib.axes.AxesSubplot at 0x10847ffd0>
Here's the plot with x
as Time
and y
as Power
:
Upvotes: 1