Reputation: 1
I am trying to generate a plot from two columns in a .csv file. The column for the x-axis is in the short date format mm/dd/yyyy while the column for the y-axis corresponds to absorption measurement data as regular numerical values. From this, I am also trying to gather a linear regression line from this plot. Here is what I have so far:
mydateparser = lambda x: datetime.strptime(x, '%m/%d/%y')
df = (pd.read_csv('calibrationabs200211.csv', index_col=[], parse_dates=[0],
infer_datetime_format=True, date_parser=mydateparser))
if mydateparser == '%m/%d/%y':
print('Error')
else:
mydateparser = float(mydateparser)
plt.figure(figsize=(15,7.5))
x = df.iloc[:, 0].values.reshape(-1, 1)
y = df.iloc[:, 1].values.reshape(-1, 1)
linear_regressor = LinearRegression()
linear_regressor.fit(x, y)
y_pred = linear_regressor.predict(y)
plt.scatter(x, y, color='teal')
plt.plot(x, y_pred, color='teal')
plt.show()
However, I am getting an error message:
TypeError Traceback (most recent call last)
<ipython-input-272-d087bdc00150> in <module>
12 print('Error')
13 else:
---> 14 mydateparser = float(mydateparser)
15
16 plt.figure(figsize=(15,7.5))
TypeError: float() argument must be a string or a number, not 'function'
Furthermore, if I comment-out the If Statement, I end up getting a plot, but with a faulty linear regression. I am fairly new to python, matplotlib, and pandas so any help or feedback is greatly appreciated. Thank you!
Upvotes: 0
Views: 1866
Reputation: 1578
Functions in Python can be used as variables, which is what you are doing here. If you want to use the result of a function for something, you need to call it by adding () after the function name.
mydateparser is a function, mydateparser() is the result of calling that function.
Additionally, I don't think the comparison you're making makes sense. datetime.strptime returns a datetime object, which you are later comparing to a string. I'm actually not sure what you're trying to accomplish with that block at all.
Your regression needs the dates to be converted to some sort of numeric value to regress against. I would suggest using matplotlib's date conversion functions, specifically date2num, to try this.
Should be something along the lines of:
from matplotlib import dates
...
x = df[0].apply(dates.date2num)
Upvotes: 1
Reputation: 989
At the start of the code, you declared mydateparser as lambda function. But float() function only accepts strings or numbers.
I assume you are using date column as a feature for linear regression model which doesn't make sense.
Instead, you can derive new features like month,year,date,weekday/weekend to be used for linear regression.
If you are looking to predict the value for next dates, you can look at time series forcasting models.
Upvotes: 0