ckaiwu
ckaiwu

Reputation: 23

Pandas v0.13.0: Setting DataFrame values of type datetime64[ns]

I recently updated Pandas to v0.13.0, and it seems to have introduced problems with datetime-typed data.

Let's take this example where we have a dataframe with one column of datetime64[ns] and one column of int32.

import pandas as pd
import numpy as np

t  = pd.date_range('2000-01-01','2000-01-20')        
v  = np.arange(0,len(t))
df = pd.DataFrame({'date':t,'val':v})

First, let's set each column to be a scalar value of the same data type.

# SETTING SCALAR OF SAME TYPE
df.loc[:,'val']  = v[0] # Works fine
df.loc[:,'date'] = t[0] # Works fine

Pandas correctly broadcasts the data. No problem with either column.

Second, let's try to replace with a scalar of a different data type:

# SETTING SCALAR, BUT OF DIFFERENT DTYPE
df.loc[:,'val']  = t[0] # Works fine
df.loc[:,'date'] = v[0] # Does not work?

While the first operation is successful, the second gives an error: "ValueError: new type not compatible with array."

Third, let's try replacing each column with a vector of data (without changing the data type):

df = pd.DataFrame({'date':t,'val':v})

# SETTING VECTOR
df.loc[:,'val']  = v * 2 # Works fine
df.loc[:,'date'] = t.shift(365) # Does not work?

Again, the first operation works. But the second operation fails, with the error: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"

Does anyone know what's going on here? It may be two separate issues. Thanks for the help!

EDIT: Thanks to Jeff for providing correct answers to the above questions. His responses do, however, raise one (hopefully) final question:

How do I assign to a subset of a DataFrame, where the subset spans multiple rows and columns and at least one column is of type datetime64?

For instance:

t  = pd.date_range('2000-01-01','2000-01-20')        
v  = np.arange(0,len(t))
df = pd.DataFrame({'date':t,'val':v,'val2':v})

# USING LABELS
df.loc[4:7,['val','val2']] = df.loc[4:7,['val','val2']] # Works fine
df.loc[4:7,['date','val']] = df.loc[4:7,['date','val']] # Does not work?

# USING ROW SLICE
df[4:7] = df[4:7]                                       # Does not work?

# USING BOOLEAN ROW MASK
mask = np.array([True] * len(df))
mask[[1,4,8]] = False
df[mask] = df[mask]                                     # Does not work?

While Jeff's solution of using df[col] = val rather than df.loc[:,col] = val correctly resolves my original problem (columnar assignment), it won't help with row-based (or row x column -based) assignment.*

Thank you.

Upvotes: 2

Views: 587

Answers (1)

Jeff
Jeff

Reputation: 129018

Do you operations as directly column setting.

In [40]: df['date'] = v[0]

In [41]: df
Out[41]: 
    date  val
0      0    0
1      0    1
2      0    2
3      0    3
4      0    4
5      0    5
6      0    6
7      0    7
8      0    8
9      0    9
10     0   10
11     0   11
12     0   12
13     0   13
14     0   14
15     0   15
16     0   16
17     0   17
18     0   18
19     0   19

[20 rows x 2 columns]

In [42]: df = pd.DataFrame({'date':t,'val':v})

In [43]: df['date'] = t.shift(365)

In [44]: df
Out[44]: 
         date  val
0  2000-12-31    0
1  2001-01-01    1
2  2001-01-02    2
3  2001-01-03    3
4  2001-01-04    4
5  2001-01-05    5
6  2001-01-06    6
7  2001-01-07    7
8  2001-01-08    8
9  2001-01-09    9
10 2001-01-10   10
11 2001-01-11   11
12 2001-01-12   12
13 2001-01-13   13
14 2001-01-14   14
15 2001-01-15   15
16 2001-01-16   16
17 2001-01-17   17
18 2001-01-18   18
19 2001-01-19   19

[20 rows x 2 columns]

by doing something like df.loc[:,'date'] looks similar. But what you are actually saying is not replace this column with what is on the right hand side, but rather, overwrite using the row mask (it happens to be null in this case). The dtype conversion is not done here because you could be potentially doing a very expensive operation.

When you are simply setting a new column, prefere the straight setitem df[col] = val

This is not a bug, rather a deliberate choice; I think I will put a doc note about this as this is the 2nd question I have seen w.r.t. to this and I guess its a bit confusing.

Upvotes: 2

Related Questions