Reputation: 3461
Code that works under Pandas 1.3.5 and python 3.7 or earlier:
import pandas as pd
import numpy as np
hex_name = '123456abc'
multi_sub_dir_id_list = [hex_name, hex_name, hex_name]
multi_leaf_node_dirs = ['one', 'two', 'three']
x_dir_multi_index = pd.MultiIndex.from_arrays ([multi_sub_dir_id_list, multi_leaf_node_dirs], names = ['hex_name', 'leaf_name'])
leaf_name = 'one'
dirpath = '/a/string/path'
task_path_str = 'thepath'
multi_exec_df = pd.DataFrame (data = None, columns = x_dir_multi_index)
multi_exec_df.loc[task_path_str] = np.nan
multi_exec_df.loc[task_path_str][hex_name, leaf_name] = dirpath
Starting with python 3.8, once something has been assigned anything, all future assignments are ignored. Current code is failing under Python 3.11.0 and Pandas 1.5.1
Is this formulation no longer allowed?
What it should look like after the above:
hex_name leaf_name
123456abc one /a/string/path
two NaN
three NaN
What it does look like after the above:
> multi_exec_df.loc[task_path_str]
hex_name leaf_name
123456abc one NaN
two NaN
three NaN
Name: thepath, dtype: float64
What I'm running for this test
Python 3.10.8 (main, Oct 13 2022, 09:48:40) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin
print(pd.__version__)
1.5.2
Upvotes: 0
Views: 122
Reputation: 11603
Here is my interpretation of what your code does.
Your setup code:
import pandas as pd
import numpy as np
hex_name = '123456abc'
multi_sub_dir_id_list = [hex_name, hex_name, hex_name]
multi_leaf_node_dirs = ['one', 'two', 'three']
x_dir_multi_index = pd.MultiIndex.from_arrays ([multi_sub_dir_id_list, multi_leaf_node_dirs], names = ['hex_name', 'leaf_name'])
leaf_name = 'one'
dirpath = '/a/string/path'
task_path_str = 'thepath'
multi_exec_df = pd.DataFrame (data = None, columns = x_dir_multi_index)
multi_exec_df.loc[task_path_str] = np.nan
At this point multi_exec_df
is a dataframe with one row full of nans:
hex_name 123456abc
leaf_name one two three
thepath NaN NaN NaN
and multi_exec_df.loc[task_path_str]
is a series containing the data from the first row:
hex_name leaf_name
123456abc one NaN
two NaN
three NaN
Name: thepath, dtype: float64
Based on your example of "what it should look like after the above" I assume you are trying to assign the value "/a/string/path"
to the column ('123456abc', 'one')
.
Here is how I would do that:
col = (hex_name, leaf_name)
multi_exec_df.loc[task_path_str, col] = dirpath
As far as I know, using loc
or similar methods is the only way to assign values to the dataframe. Is there a reason you can't do that here?
Now to the question of what your code is doing...
Instead of the above, you are executing the following line:
multi_exec_df.loc[task_path_str][hex_name, leaf_name] = dirpath
This is equivalent to:
multi_exec_df.loc[task_path_str][(hex_name, leaf_name)] = dirpath
The problem with it is that multi_exec_df.loc[task_path_str]
is a copy of the row from the dataframe, not a view. When I execute above I get the following:
<ipython-input-26-2d4fae3863b0>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
multi_exec_df.loc[task_path_str][hex_name, leaf_name] = dirpath
(Maybe you knew that but you didn't mention it so I pointed it out. Not sure why you didn't get this warning. If you are not familiar with what a view is read the documentation at the link above in the warning).
You asked "Is this formulation no longer allowed?"
Obviously it is allowed, but you must accept that you are assigning the new value to a copy of the row, not the row in the original dataframe.
I don't know whether this making a copy instead of a view changed at some point in Pandas development, if that is what you are asking.
This was done with Pandas 1.5.1.
Upvotes: 1