Reputation: 24314
Let's say I have this DataFrame:
df=pd.DataFrame([[2, 3], [4, 5], [6, 7]], index=pd.MultiIndex.from_tuples([
(pd.Timestamp('2019-07-01 23:00:00'), pd.Timestamp('2019-07-01 23:00:00'), 0),
(pd.Timestamp('2019-07-02 00:00:00'), pd.Timestamp('2019-07-02 00:00:00'), 0),
(pd.Timestamp('2019-07-02 00:00:00'), pd.Timestamp('2019-07-02 00:00:00'), 0)],
names=['dt_calc', 'dt_fore', 'positional_index']), columns=['temp', 'temp_2'])
idx = df.index[0]
Now I want to replace the cells with a list object(I know that storing complex objects in pandas columns is generally not a good practice) so I do:
df.loc[idx, 'temp'] = pd.Series([[1, 2, 3]], index=[idx])
#It is working fine and as expected
#This is for example..it has nothing to do with my actual question
but if I try to assign a list or nested list like this It will throw me an error(as expected):
df.loc[idx,'temp']=[1,2,3]
df.loc[idx,'temp']=[[1,2,3]]
df.loc[idx,'temp']=[[[1,2,3]]]
But Now If I try to assign a list of 3 or more dimensions with string in it:
df.loc[idx,'temp']=[[['1','2','3']]]
#It is working
#But If now if I assign a list of 3 or more dimensions with int in it after running the above code:
df.loc[idx,'temp']=[[[1,2,3]]]
#It is also working
So depending on the above observations(I am using python 3.9 and pandas version is '1.3.0')
My question:
what is this behaviour of loc
accessor?
Upvotes: 2
Views: 143
Reputation: 3929
I can only answer this question partially. I can't provide technical details. But hopefully it helps to give some idea what is happening.
Pandas optimizes A LOT under the hood.
In the case
df.loc[idx,'temp']=[1,2,3]
pandas automatically detects you want to set an numpy-array-like-structure on a column with type int64. It switches to a fast track optimized for arrays. Then it detects you provided too many elements and throws an error.
In the case of
df.loc[idx,'temp']=[[['1','2','3']]]
pandas doesn't use the fast track, because it's clearly not a numpy-array-like structure. It also changes the dtype to object. After the dtype is changed to object the .loc
accessor doesn't try to use the fast track anymore. That's why the operation df.loc[idx,'temp']=[[['1','2','3']]]
is working.
Upvotes: 1
Reputation: 120399
I can't answer to this question but it's working only because the two first levels of your MultiIndex are dates:
df1=pd.DataFrame([[2, 3]],
index=pd.MultiIndex.from_tuples([('foo', 'bar', 'baz')], names=['A', 'B', 'C']),
columns=['V1', 'V2'])
idx1 = df1.index[0]
>>> idx1
('foo', 'bar', 'baz')
>>> df1.loc[idx1,'temp'] = [[[1,2,3]]]
...
ValueError: setting an array element with a sequence.
If you append one level to your dataframe, the same logic doesn't work:
df=pd.DataFrame([[2, 3], [4, 5], [6, 7]], index=pd.MultiIndex.from_tuples([
(pd.Timestamp('2019-07-01 23:00:00'), pd.Timestamp('2019-07-01 23:00:00'), 0, 1),
(pd.Timestamp('2019-07-02 00:00:00'), pd.Timestamp('2019-07-02 00:00:00'), 0, 2),
(pd.Timestamp('2019-07-02 01:00:00'), pd.Timestamp('2019-07-02 01:00:00'), 0, 3)],
names=['dt_calc', 'dt_fore', 'positional_index', 'additional_index']), columns=['temp', 'temp_2'])
idx = df.index[0]
>>> idx
(Timestamp('2019-07-01 23:00:00'), Timestamp('2019-07-01 23:00:00'), 0, 1)
>>> df.loc[idx,'temp']=[1,2,3]
...
ValueError: setting an array element with a sequence.
>>> df.loc[idx,'temp']=[[1,2,3]]
...
ValueError: setting an array element with a sequence.
>>> df.loc[idx,'temp']=[[[1,2,3]]]
...
ValueError: setting an array element with a sequence.
>>> df.loc[idx,'temp']=[[[[1,2,3]]]]
...
ValueError: setting an array element with a sequence.
The answer is probably here. You can open an issue to github.
Upvotes: 0