Reputation: 21
I have a data frame with a multi index and I wanted to change the values on one of them.
For example:
index = [1,23,356,405,513,65,6787,898,679]
index_2 = ["A","B","C","D","E","F","G","H","I"]
names= ["James","Adam","Mary","Tom","Sam","Harry","Jacob","Isa","Rick"]
df_test = pd.DataFrame(data=names, index=[index, index_2])
This will give me a data frame with two indices. "Index" are random numbers such as the above. However I want to change the index to default values such that it takes values from 0,1,2,3 so on instead of the random numbers above.
I am doing it with a very large dataset which has random numbering to the data and I want to change it to have default index numbers from 0,1,2 and so on.
So my question is, how do I replace the values with a default index?
Upvotes: 2
Views: 1446
Reputation: 29635
If df_test
exist already, you can set the index with pd.MultiIndex.from_arrays and get the codes
from the original index level you want to replace by incremental value and get_level_values
for the other one.
# assume df_test created like this
index = [1,1,356,356,356,6787,6787,6787,6787] #change this to be more like your problem
index_2 = ["A","B","C","D","E","A","B","C","D"]
names= ["James","Adam","Mary","Tom","Sam","Harry","Jacob","Isa","Rick"]
df_test = pd.DataFrame(data=names, index=[index, index_2])
print (df_test)
0
1 A James
B Adam
356 C Mary
D Tom
E Sam
6787 A Harry
B Jacob
C Isa
D Rick
# so you can do to have regular incremental for first level of index
df_test.index = pd.MultiIndex.from_arrays([df_test.index.codes[0],
df_test.index.get_level_values(1)])
print (df_test)
0
0 A James
B Adam
1 C Mary
D Tom
E Sam
2 A Harry
B Jacob
C Isa
D Rick
Upvotes: 1
Reputation: 28649
not sure if this is what u r after, a visual of ur expected output would be helpful :
#drop the index with the random numbers
df_test = df_test.droplevel(0)
#get the indices for the letters
#assumption here is that the letters r not repeated
new_index = df_test.index.get_indexer_for(df_test.index)
#if the letters are not unique :
from itertools import chain, islice
c = chain.from_iterable
e = enumerate
#this allows us to pull the unique numbers per letter
new_index = islice(c(e(df_test.index)),0,None,2)
#assign the new index, and set it as the outermost index
df_test.set_index(new_index,append=True).swaplevel(1,0)
Upvotes: 0
Reputation: 12008
You could pass a range with the length of the data to your index:
range_1 = list(range(len(names)))
df_test = pd.DataFrame(data=[names], index=[range_1, index_2])
Upvotes: 0