nrvaller
nrvaller

Reputation: 373

Pandas read_csv converting string to datetime and saving to database Python

I'm currently reading from a csv file with pandas. I'd like for the column that has a string which resembles datetime to be turned into a DateTime object. I have restrictions on my database that specify that column should be of such type.

an Example row from my csv file is:

2015-11-28 00:10:00, 99, 21

I originaly faced the problem when I had the following.

    # import CSV file
    df = pd.read_csv(filename, error_bad_lines=False, warn_bad_lines=False)

    # TODO  Save data into db
    # ADD time, banana, apple
    for i in df.index:
        m = AggregatedMeasurement(
            time=df.loc[i, 'Time UTC'],
            banana=df.loc[i, 'BANANA'],
            apple=df.loc[i, 'APPLE']
        )
        db.session.add(m)  # add to db
    db.session.commit()  # perform them all

I received the following error.

StatementError: (builtins.TypeError) SQLite DateTime type only accepts Python datetime and date objects as input

I started searching here and found that one can indicate date_parser and started combinations. This is what I have tried so far with the help of this suggestion. Can pandas automatically recognize dates

    dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

    # import CSV file
    df = pd.read_csv(filename, error_bad_lines=False, warn_bad_lines=False, parse_dates=['datetime'],
                     date_parser=dateparse)

and now I am getting the following error

ValueError: 'datetime' is not in list

Am I implementing this wrong? Thanks in advance!

UPDATE 1

After the following changes

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

# import CSV file
    df = pd.read_csv(filename, error_bad_lines=False,
                      warn_bad_lines=False,
                      parse_dates=['Time UTC'],
                      date_parser=dateparse)

I got a TypeError: 'NoneType' object is not subscriptable on this specific part date_parser=dateparse)

UPDATE 2 Later, I tried the next suggestion.

    df = pd.read_csv(filename, error_bad_lines=False,
                     warn_bad_lines=False)

    # Changing datatype
    df['Time UTC'] = pd.to_datetime(df['Time UTC'])

And I think that worked :) I got another error but that corresponds to another database restriction relationship/ForeignKey that I haven't set up yet.

Upvotes: 3

Views: 7618

Answers (1)

arnaud
arnaud

Reputation: 3483

Could be a simple mistake. I think you should do this :

df = pd.read_csv(filename, 
                 error_bad_lines=False, 
                 warn_bad_lines=False, 
                 parse_dates=['Time UTC'],
                 date_parser=dateparse)

Indeed, you should refer parse_dates to 'Time UTC' which is the column of interest, and not 'datetime'.

Otherwise, you may more easily let pd.to_datetime() handle the task, using :

df = pd.read_csv(filename, error_bad_lines=False,
                     warn_bad_lines=False)

# Changing datatype
df['Time UTC'] = pd.to_datetime(df['Time UTC'])

Upvotes: 2

Related Questions