Reputation: 101
I know this has been asked like 100 times but I still don't get it and the given solutions don't get me anywhere.
Im trying to convert time into a comparable format with Pandas/Python. I used a db entries as data and currently I have trouble using time like this:
52 2017-08-04 12:26:56.348698
53 2017-08-04 12:28:22.961560
54 2017-08-04 12:34:20.299041
the goal is to use it as year1
and year2
to make a graph like:
def sns_compare(year1,year2):
f, (ax1) = plt.subplots(1, figsize=LARGE_FIGSIZE)
for yr in range(int(year1),int(year2)):
sns.distplot(tag2[str(yr)].dropna(), hist=False, kde=True, rug=False, bins=25)
sns_compare(year1,year2)
When I try to to it like this I get ValueError: invalid literal for int() with base 10: '2017-08-04 12:34:20.299041'
.
So currently I think about using Regex to manipulate the time fields but this cant be the way to go or at least I cant imagine. I tried all kind of suggestions from SO/GitHub but nothing really worked. I also don't know what the "optimal" time structure should look like. Is it 20170804123420299041
or something like 2017-08-04-12-34-20-299041
. I hope somebody can make this clear to me.
Upvotes: 1
Views: 571
Reputation: 13672
This is your data:
from matplotlib import pyplot as plt
from datetime import datetime
import pandas as pd
df = pd.DataFrame([("2017-08-04 12:26",56.348698),("2017-08-04 12:28",22.961560),("2017-08-04 12:34",20.299041)])
df.columns = ["date", "val"]
First, we convert to datetime, then we reduce year1
, next we convert to days.
df['date'] = pd.to_datetime(df["date"])
df["days"]=(df['date'] -datetime(year1,1,1)).dt.total_seconds()/86400.0
plot the data, and display only the days between year1
and year2
plt.scatter(df["days"],df["val"])
plt.xlim((0,(year2-year1)*365))
plt.show()
Upvotes: 1
Reputation: 5641
Have you looked at pd.to_datetime? Pandas and Seaborn should be able to handle dates fine, and you don't have to convert them to integers.
Upvotes: 1