Greg Dougherty
Greg Dougherty

Reputation: 3461

Starting w/ python 3.8, Pandas won't let me reassign value in a DataFrame

Code that works under Pandas 1.3.5 and python 3.7 or earlier:

import pandas as pd
import numpy as np
hex_name = '123456abc'
multi_sub_dir_id_list = [hex_name, hex_name, hex_name]
multi_leaf_node_dirs = ['one', 'two', 'three'] 
x_dir_multi_index = pd.MultiIndex.from_arrays ([multi_sub_dir_id_list, multi_leaf_node_dirs], names = ['hex_name', 'leaf_name'])
leaf_name = 'one'
dirpath = '/a/string/path'
task_path_str = 'thepath'
multi_exec_df = pd.DataFrame (data = None, columns = x_dir_multi_index)
multi_exec_df.loc[task_path_str] = np.nan
multi_exec_df.loc[task_path_str][hex_name, leaf_name] = dirpath

Starting with python 3.8, once something has been assigned anything, all future assignments are ignored. Current code is failing under Python 3.11.0 and Pandas 1.5.1

Is this formulation no longer allowed?

What it should look like after the above:

hex_name   leaf_name
123456abc  one        /a/string/path
           two        NaN
           three      NaN

What it does look like after the above:

> multi_exec_df.loc[task_path_str]
hex_name   leaf_name
123456abc  one         NaN
           two         NaN
           three       NaN
Name: thepath, dtype: float64

What I'm running for this test

Python 3.10.8 (main, Oct 13 2022, 09:48:40) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin
print(pd.__version__)
1.5.2

Upvotes: 0

Views: 122

Answers (1)

Bill
Bill

Reputation: 11603

Here is my interpretation of what your code does.

Your setup code:

import pandas as pd
import numpy as np
hex_name = '123456abc'
multi_sub_dir_id_list = [hex_name, hex_name, hex_name]
multi_leaf_node_dirs = ['one', 'two', 'three'] 
x_dir_multi_index = pd.MultiIndex.from_arrays ([multi_sub_dir_id_list, multi_leaf_node_dirs], names = ['hex_name', 'leaf_name'])
leaf_name = 'one'
dirpath = '/a/string/path'
task_path_str = 'thepath'
multi_exec_df = pd.DataFrame (data = None, columns = x_dir_multi_index)
multi_exec_df.loc[task_path_str] = np.nan

At this point multi_exec_df is a dataframe with one row full of nans:

hex_name  123456abc          
leaf_name       one two three
thepath         NaN NaN   NaN

and multi_exec_df.loc[task_path_str] is a series containing the data from the first row:

hex_name   leaf_name
123456abc  one         NaN
           two         NaN
           three       NaN
Name: thepath, dtype: float64

Based on your example of "what it should look like after the above" I assume you are trying to assign the value "/a/string/path" to the column ('123456abc', 'one').

Here is how I would do that:

col = (hex_name, leaf_name)
multi_exec_df.loc[task_path_str, col] = dirpath

As far as I know, using loc or similar methods is the only way to assign values to the dataframe. Is there a reason you can't do that here?

Now to the question of what your code is doing...

Instead of the above, you are executing the following line:

multi_exec_df.loc[task_path_str][hex_name, leaf_name] = dirpath

This is equivalent to:

multi_exec_df.loc[task_path_str][(hex_name, leaf_name)] = dirpath

The problem with it is that multi_exec_df.loc[task_path_str] is a copy of the row from the dataframe, not a view. When I execute above I get the following:

<ipython-input-26-2d4fae3863b0>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  multi_exec_df.loc[task_path_str][hex_name, leaf_name] = dirpath

(Maybe you knew that but you didn't mention it so I pointed it out. Not sure why you didn't get this warning. If you are not familiar with what a view is read the documentation at the link above in the warning).

You asked "Is this formulation no longer allowed?"

Obviously it is allowed, but you must accept that you are assigning the new value to a copy of the row, not the row in the original dataframe.

I don't know whether this making a copy instead of a view changed at some point in Pandas development, if that is what you are asking.

This was done with Pandas 1.5.1.

Upvotes: 1

Related Questions