Reputation: 23
I am trying to learn how to use Pandas in Python. I am having issue doing math to my Panda dataframe. Right now my dataframe looks something like this:
print (mark)
0 1 2 3 4 5 6
0 447366345 -2.04 -2.69 176.98 418.84 34.3167521 -118.4068498
1 447406197 -2.34 -2.18 176.88 418.77 34.3167522 -118.4068499
2 447446155 -2.63 -1.56 176.74 418.77 34.3167522 -118.4068499
3 447486653 -2.89 -0.95 176.58 418.84 34.3167522 -118.4068499
4 447526241 -3.12 -0.42 176.43 418.84 34.3167522 -118.4068499
5 447566373 -3.34 -0.07 176.32 418.84 34.3167522 -118.4068497
6 447606036 -3.56 0.05 176.26 418.66 34.3167523 -118.4068497
7 447645783 -3.77 -0.03 176.28 418.66 34.3167523 -118.4068497
8 447686269 -3.95 -0.31 176.43 418.95 34.3167523 -118.4068497
def data_reader(filename, rowname):
with open(filename, newline='') as fp:
yield from (row[1:] for row in csv.reader(fp, skipinitialspace=True)
if row[0] == rowname)
mike = pd.DataFrame.from_records(data_reader('data.csv', 'mike'))
Now let say I want to take row 0 and divide it by 1000
mark_time = mark[0] / 1000
This produces the error
TypeError: unsupported operand type(s) for /: 'str' and 'int'
I am guessing because current my dataframe is not considered an INT, so I went ahead and did this:
mark_time = float (mark[0] / 1000)
However, this also gave me the same error. Could someone please explain to me why?
My 2nd question is when it comes to plotting. I have learned matplotlib very well and I wanted to use it on my Panda dataframe. Currently the way I do it is this:
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(311)
ax.plot(mike_time, mike[0], label='mike speed', color = 'red')
plt.legend(loc='best',prop={'size':10})
Could I just replace mike_time, and mike[0] with my dataframe?
Upvotes: 1
Views: 15257
Reputation: 3127
You need to use pandas.read_csv instead of python's csv.
There you can use the dtype argument to provide it with the correct types of data for it to use:
From pandas documentation
dtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} (unsupported with engine='python'). Use str or object to preserve and not interpret dtype.
If you must parse the CSV outside pandas an importing with "from_records" you can use coerce_float=True. Reference
coerce_float : boolean, default False Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets
Upvotes: 1
Reputation: 2821
You need to use pandas read_csv which will automatically assign the most appropriate type to each column. If you have any mixed type columns it will warn you. You can then run it again setting the type explicitly.
Upvotes: 0