Reputation: 405
I am writing an application that makes use of pandas (version 0.10.1) to store the underlying data model as a (3-level) MultiIndex'ed DataFrame. The model is a line spectrum, and the top level of the index is the atomic transition.
A simple dataframe could look like this:
Pos Sigma Ampl Line center Identifier
H-alpha-6697.6 30-30 Comp2 -3.600 0.774000 33.058000 6699.5 b
Comp3 3.538 2.153000 28.054000 6699.5 c
Contin NaN NaN 0.000000 NaN NaN
Comp4 1.384 0.921000 37.504000 6699.5 d
Comp1 -2.124 1.977000 69.166000 6699.5 a
31-31 Comp2 -3.292 0.884603 49.813423 6699.5 b
Comp3 3.600 2.299000 19.999000 6699.5 c
Contin NaN NaN 0.000000 NaN NaN
Comp4 1.692 1.009000 22.222000 6699.5 d
Comp1 -1.262 2.534000 68.002000 6699.5 a
At some point, I need to be able to create a different transition, e.g. H-beta, using H-alpha as a template. I would ideally do this by something like df.ix['H-beta-wavelength'] = df.ix['H-alpha-6697.6']
, but this is not possible to do. So instead, I tried following this example: Prepend a level to a pandas MultiIndex
However, the example above requires the .names
of the multiindex levels to be set in order to reorder them. And the names
attribute is set when initializing the dataframe, but during the building of it, I rely quite extensibly on the set_values() method, and doing this destroys the names
attribute - or rather sets them to [None, None, None]
.
Example:
In [68]: df
Out[68]:
Pos Sigma Ampl Line center Identifier
Transition Rows Component
Center: 6699.5 26-26 Comp2 -3.846 0.657 15.2740 6699.5 b
Comp3 2.924 1.449 31.3930 6699.5 c
Contin NaN NaN 0.0000 NaN NaN
Comp4 8.030 1.009 7.0831 6699.5 d
Comp1 -1.816 2.153 50.2750 6699.5 a
In [69]: df.set_value(('Center: 5044.3', '26-26', 'Comp1'), 'Sigma', 2.457)
Out[69]:
Pos Sigma Ampl Line center Identifier
Center: 6699.5 26-26 Comp2 -3.846 0.657 15.2740 6699.5 b
Comp3 2.924 1.449 31.3930 6699.5 c
Contin NaN NaN 0.0000 NaN NaN
Comp4 8.030 1.009 7.0831 6699.5 d
Comp1 -1.816 2.153 50.2750 6699.5 a
Center: 5044.3 26-26 Comp1 NaN 2.457 NaN NaN NaN
Of course, this makes it quite hard to use the names for reordering the levels of the multiindex. Is there a way to avoid this, short of brute-force setting the names after each time I've run set_values()
?
Here is an iPython session recreating the index.names
problem with a somewhat simpler example. It also shows that it is possibly a bug that goes beyond index.names
, as it seems to change the index.lexsort_depth
from 3 to 0. Missing numbers in the prompt are just unnecessary views of the dataframe.
I believe that one must choose secondary and/or tertiary indices that already exist like I have done below in order to reproduce it.
In [4]: idx = pd.MultiIndex.from_arrays(
[['Hans']*4 + ['Grethe']*4, ['1', '1', '2', '2']*2, ['a', 'b']*4],
names=['Name', 'Number', 'Letter'])
In [5]: df = pd.DataFrame(
random.random((8, 3)),
columns=['one', 'two','three'],
index=idx)
In [6]: df
Out[6]:
one two three
Name Number Letter
Hans 1 a 0.803566 0.434574 0.805976
b 0.655322 0.208469 0.989559
2 a 0.893952 0.380358 0.173764
b 0.822446 0.673894 0.676573
Grethe 1 a 0.202641 0.387263 0.405296
b 0.646733 0.086953 0.882114
2 a 0.358458 0.147107 0.769586
b 0.183782 0.477863 0.601098
# To rule out another possible source of problems:
In [9]: df.unstack().drop(('Grethe', '1')).stack()
Out[9]:
one two three
Name Number Letter
Grethe 2 a 0.358458 0.147107 0.769586
b 0.183782 0.477863 0.601098
Hans 1 a 0.803566 0.434574 0.805976
b 0.655322 0.208469 0.989559
2 a 0.893952 0.380358 0.173764
b 0.822446 0.673894 0.676573
In [10]: df.set_value(('Frans', '2', 'b'), 'one', 23.)
Out[10]:
one two three
Hans 1 a 0.803566 0.434574 0.805976
b 0.655322 0.208469 0.989559
2 a 0.893952 0.380358 0.173764
b 0.822446 0.673894 0.676573
Grethe 1 a 0.202641 0.387263 0.405296
b 0.646733 0.086953 0.882114
2 a 0.358458 0.147107 0.769586
b 0.183782 0.477863 0.601098
Frans 2 b 23.000000 NaN NaN
In [11]: df = df.sortlevel(level='Name')
In [13]: df.index.lexsort_depth
Out[13]: 3
In [14]: df.set_value(('Frans', '2', 'b'), 'one', 23.).index.lexsort_depth
Out[14]: 0
Upvotes: 0
Views: 3437
Reputation: 405
So according to Andy Hayden, this is a names
bug in pandas.
Hopefully a fix will come soon.
Until then, I believe the best way to do this is to do the following:
tmp = df.ix['ExistingTransition'].copy()
tmp['Transition'] = 'NewTransition'
tmp = tmp.set_index('Transition', append=True)
tmp.index = tmp.index.reorder_levels([2, 0, 1])
# ...Do whatever else needs to be done to this before applying as template...
df = df.append(tmp)
...That, or making sure thet the names
attribute is recreated after each run of set_values()
, and then just going by the example linked in the question.
Upvotes: 0
Reputation: 129008
Your index needs to be sorted! See docs here: http://pandas.pydata.org/pandas-docs/dev/indexing.html#the-need-for-sortedness and these recipes may help http://pandas.pydata.org/pandas-docs/dev/cookbook.html This is 0.10.1 as well
Heres a sorted frame
In [26]: index = pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
names=['first', 'second'])
In [27]: df = pd.DataFrame(np.random.rand(len(index)), index=index,columns=['A'])
In [7]: df.index.lexsort_depth
Out[7]: 2
In [28]: df.set_value(('a',1),'A',1)
Out[28]:
A
first second
a 1 1.000000
2 0.136456
b 1 0.712612
2 0.818473
And if I sort by the 2nd level (so its unsorted)
In [29]: df2 = df.sortlevel(level='second')
# this is not sorted! (well it is, just not lexsorted)
In [10]: df2.index.lexsort_depth
Out[10]: 0
In [30]: df2.set_value(('b','1'),'A',2)
Out[30]:
A
a 1 1.000000
b 1 0.712612
a 2 0.136456
b 2 0.818473
1 2.000000
Upvotes: 1