Convert date in pandas

Question

I know this has been asked like 100 times but I still don't get it and the given solutions don't get me anywhere.

Im trying to convert time into a comparable format with Pandas/Python. I used a db entries as data and currently I have trouble using time like this:

52   2017-08-04 12:26:56.348698
53   2017-08-04 12:28:22.961560
54   2017-08-04 12:34:20.299041

the goal is to use it as year1 and year2 to make a graph like:

def sns_compare(year1,year2):
    f, (ax1) = plt.subplots(1, figsize=LARGE_FIGSIZE)
    for yr in range(int(year1),int(year2)):
        sns.distplot(tag2[str(yr)].dropna(), hist=False, kde=True, rug=False, bins=25)
sns_compare(year1,year2)

When I try to to it like this I get ValueError: invalid literal for int() with base 10: '2017-08-04 12:34:20.299041'.

So currently I think about using Regex to manipulate the time fields but this cant be the way to go or at least I cant imagine. I tried all kind of suggestions from SO/GitHub but nothing really worked. I also don't know what the "optimal" time structure should look like. Is it 20170804123420299041 or something like 2017-08-04-12-34-20-299041. I hope somebody can make this clear to me.

Uri Goren · Accepted Answer

This is your data:

from matplotlib import pyplot as plt
from datetime import datetime
import pandas as pd
df = pd.DataFrame([("2017-08-04 12:26",56.348698),("2017-08-04 12:28",22.961560),("2017-08-04 12:34",20.299041)])
df.columns = ["date", "val"]

First, we convert to datetime, then we reduce year1, next we convert to days.

df['date'] = pd.to_datetime(df["date"])
df["days"]=(df['date'] -datetime(year1,1,1)).dt.total_seconds()/86400.0

plot the data, and display only the days between year1 and year2

plt.scatter(df["days"],df["val"])
plt.xlim((0,(year2-year1)*365))
plt.show()

Convert date in pandas

Answers (2)

Related Questions