Relaxed1
Relaxed1

Reputation: 1012

Dataframe datetime column as argument to function

When passing multiple columns of a dataframe as arguments to a function, the datetime column does not want to do the formatting function below. I can manage with the inline solution shown ... But it would be nice to know the reason ... Should I have e.g. used a different date-like data type? Thanks (p.s. Pandas = great)

import pandas as pd
import numpy as np
import datetime as dt

def fmtfn(arg_dttm, arg_int):
    retstr = arg_dttm.strftime(':%Y-%m-%d') + '{:0>3}'.format(arg_int)
    # bombs with: 'numpy.datetime64' object has no attribute 'strftime'

#     retstr = '{:%Y-%m-%d}~{:0>3}'.format(arg_dttm, arg_int)
    # bombs with: invalid format specifier
    return retstr

def fmtfn2(arg_dtstr, arg_int):
    retstr = '{}~{:0>3}'.format(arg_dtstr, arg_int)
    return retstr


# The source data.
# I want to add a 3rd column newhg that carries e.g. 2017-06-25~066
# i.e. a concatenation of the other two columns. 
df1 = pd.DataFrame({'mydt': ['2017-05-07', '2017-06-25', '2015-08-25'],
                    'myint': [66, 201, 100]})


df1['mydt'] = pd.to_datetime(df1['mydt'], errors='raise')


# THIS WORKS (without calling a function)
print('\nInline solution')
df1['newhg'] = df1[['mydt', 'myint']].apply(lambda x: '{:%Y-%m-%d}~{:0>3}'.format(x[0], x[1]), axis=1) 
print(df1)

# THIS WORKS
print('\nConvert to string first')
df1['mydt2'] = df1['mydt'].apply(lambda x: x.strftime('%Y-%m-%d'))
df1['newhg'] = np.vectorize(fmtfn2)(df1['mydt2'], df1['myint'])  
print(df1)

# Bombs in the function - see above
print('\nPass a datetime')
df1['newhg'] = np.vectorize(fmtfn)(df1['mydt'], df1['myint'])  
print(df1)

Upvotes: 0

Views: 470

Answers (1)

Jan Zeiseweis
Jan Zeiseweis

Reputation: 3738

You could have also used the builtin functions from pandas which make it a bit easier to read:

df1['newhg'] = df1.mydt.dt.strftime('%Y-%m-%d') +  '~' + df1.myint.astype(str).str.zfill(3)

Upvotes: 1

Related Questions