jim jarnac
jim jarnac

Reputation: 5152

Pandas - set datetime / Timestamp value with apply

Lets consider the following dataframe:

a=pd.DataFrame(data=list(range(10)))

Trying to create a new column from it with apply: The column is supposed to contain Timestamp object

def test(x):
    x["date"]=pd.Timestamp("2017-01-01")
    return x

a.apply(test,axis=1)

However this result in a dataframe containing the .value numerical attribute of the timestamp:

    0   date
0   0   1483228800000000000
1   1   1483228800000000000
2   2   1483228800000000000
3   3   1483228800000000000
4   4   1483228800000000000
5   5   1483228800000000000
6   6   1483228800000000000
7   7   1483228800000000000
8   8   1483228800000000000
9   9   1483228800000000000

How come it is so? How can I get the correct Timestamp in the DataFrame?


Edit: here the full code that is giving me problem: This is for respondant to understand that i am not trying to set a simply formed list of datetimes to a new column:

def mae(x):
    entry=x.name[0]
    exit=x.name[1]
    m=d.loc[entry:exit,"close"]
    if x.dir==-1:
        r=(m.sub(m[::-1].cummax()[::-1])).abs().max()
        imax=(m.sub(m[::-1].cummax()[::-1])).idxmin()
    elif x.dir==1:
        r=(m.sub(m[::-1].cummin()[::-1])).abs().max()
        imax=(m.sub(m[::-1].cummin()[::-1])).idxmax()   
    else: r=0
    x['mae']=r*gbp['multiplier']
    x['peak']=imax 
    return x

k=g.head().apply(mae,axis=1)

This takes as input another dataframe and some financial prices data stored in a database - so a bit difficult to invent an example that at the same time is simple enough for people to grab it quickly and complicated enough as to justify the use of .apply.

I think there is something i dont understand / or a bug in the .apply function - this is what i would like to have input and help on. Thanks guys!

Upvotes: 0

Views: 2648

Answers (2)

stucash
stucash

Reputation: 1258

a few rounds of debugging with your code and further into pandas, it seems there was something you can improve in your code.

you can read more details in _setitem_with_indexer, line 387-393 and in numpy.concatenate

In short, _setitem_with_indexer makes use of numpy.concatenate as part of setting new values using indexer.

Because your first column in a had only integers and your new column a['date'] was trying to concatenate an integer, with a pandas.Timestamp(it's actually a numpy.datetime64), numpy simply refused to do that.

to show that this is the case, we can illustrate by following examples.

Setup

import pandas as pd
import numpy as np

s1 = [1]

s2 = np.array([np.datetime64("2017-01-01")])

s3 = [pd.Timestamp("2017-01-01")]

a = pd.DataFrame(data=pd.date_range("01-01-2017", "01-06-2017"))

b = pd.DataFrame(data=["d", "c", "d", "d"])

c = pd.DataFrame(data=list(range(10))) 

def test(x):
    x["date"]=pd.Timestamp("2017-01-01")
    return x

Trials

a.apply(test, axis=1)

# output 

           0       date
0 2017-01-01 2017-01-01
1 2017-01-02 2017-01-01
2 2017-01-03 2017-01-01
3 2017-01-04 2017-01-01
4 2017-01-05 2017-01-01
5 2017-01-06 2017-01-01

b.apply(test, axis=1)

# output

   0                 date
0  d  1483228800000000000
1  c  1483228800000000000
2  d  1483228800000000000
3  d  1483228800000000000

c.apply(test,axis=1)

# output

   0                 date
0  0  1483228800000000000
1  1  1483228800000000000
2  2  1483228800000000000
3  3  1483228800000000000
4  4  1483228800000000000
5  5  1483228800000000000
6  6  1483228800000000000
7  7  1483228800000000000
8  8  1483228800000000000
9  9  1483228800000000000

I think this is more behaviour of numpy.concatenate. if we choose to use pd.Timestamp we will observe different behaviour, which essentially would succeed in concatenation; but not the same with numpy.datetime64, which is what's used inside _setitem_with_indexer.

Observation

np.concatenate([s1,s2])

# output
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: invalid type promotion
invalid type promotion

np.concatenate([s1,s3])

# output

array([1, Timestamp('2017-01-01 00:00:00')], dtype=object)
***please note here dtype is object***

As for cleanness or elegance, I think Wen's comment is already very minimal, but you might have a reason for using apply.

one more thing to point out is, apply takes in a function that does something on each column/row, but it seems you were trying to use the function arg as a DataFrame.

Upvotes: 3

sfjac
sfjac

Reputation: 7294

I think you just need to inform Pandas that the column should be interpreted as a Datetime:

a = pd.DataFrame(data=list(range(10)))
def test(x):
    x['date'] = pd.Timestamp('2017-01-01')
    return x

a = a.apply(test, axis=1)
a.date = a.date.astype('datetime64[ns]')

This gives:

    0   date
0   0   2017-01-01
1   1   2017-01-01
2   2   2017-01-01
3   3   2017-01-01
4   4   2017-01-01
5   5   2017-01-01
6   6   2017-01-01
7   7   2017-01-01
8   8   2017-01-01
9   9   2017-01-01

Alternatively, I was able to do this by creating the column first and setting its type:

a = pd.DataFrame(data=list(range(10)))
a['date'] = None
a.date.astype('datetime64[ns]')

def test(x):
    x['date'] = pd.Timestamp('2017-01-01')
    return x

a = a.apply(test, axis=1)

Upvotes: 0

Related Questions