Reputation: 4914
Why am I able to rename a row in a panda Series with ('a','b') but not (1.0, 2.0). Why does the type of value in the tuple matter?
df = pd.DataFrame({'a': [1,2,3,4,5], 'b':[1,1,1,1,1,]}).set_index('a')
df.rename(index={1:(1,2)})
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
df.rename(index={1:('1','2')})
b
a
(1, 2) 1
2 1
3 1
4 1
5 1
I'd very much like to be able to keep that as integers/floats.
Upvotes: 3
Views: 1239
Reputation: 109546
I'm not sure why it can't be done using rename
, but you can create integer or float tuples in a list and then assign the result to the index.
This works in Pandas 0.14.1:
idx = [(1, 2), 2, 3, 4, 5]
df.index = idx
>>> df
b
(1, 2) 1
2 1
3 1
4 1
5 1
EDIT Here are some timing comparisons with a 500k row dataframe.
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4,5]*100000, 'b':[1,1,1,1,1,]*100000})
# Create 100k random numbers in the range of the index.
rn = np.random.random_integers(0, 499999, 100000)
# Normal lookup using `loc`.
>>> %%timeit -n 3 some_list = []
[some_list.append(df.loc[a]) for a in rn]
3 loops, best of 3: 6.63 s per loop
# Normal lookup using 'xs' (used only for getting values, not setting them).
>>> %%timeit -n 3 some_list = []
[some_list.append(df.xs(a)) for a in rn]
3 loops, best of 3: 4.46 s per loop
# Set the index to tuple pairs and lookup using 'xs'.
idx = [(a, a + 1) for a in np.arange(500000)]
df.index = idx
>>> %%timeit -n 3 some_list = []
[some_list.append(df.xs((a, a + 1))) for a in rn]
3 loops, best of 3: 4.64 s per loop
As you can see, the difference in performance is negligible when looking up values from the dataframe.
Note that you cannot use 'loc' with the tuple index:
>>> df.loc[(1, 2)]
KeyError: 'the label [1] is not in the [index]'
Upvotes: 1