Reputation: 4063

pandas dataframe: Changing from single index to multi-column index

In python pandas I have a dataframe

df_aaa:

date     data  otherdata symbol
2015/1/1 11    12         aaa
2015/2/1 21    22         aaa
2015/3/1 31    31         aaa

df_all:

2015/1/1 31    31         bbb

Currently the index of both is by date .

I want to append df_aaa to df_all, and have them with a composite index of both symbol and date.

How do I do that?

Basically the following are all one question: How do I set a multi-index and use it when appending. Can I do it with different column order? Do I need to refresh? Etc.:

I'm not sure if a multi-index is an index that has multiple 'columns' (or rows), or is it the ability to have more than one index (and any of them could be for multiple columns or rows). Or are both correct?
Must I first set the index of both dataframes to a multi-index, so the append will work? (otherwise I'll have duplicates for different symbols
Do I have to "drop" the existing index before creating the new one?
Is there such a thing as a dataframe with data but no index?
Must a (single) index be of unique values?
When do I use which of the following dataframe methods: set_index(), reindex(), reset_index(), set_level, reset_level?
And what is the default when I give these methods an array. Python docs are daunting, and I can't find my hands or legs in them. Giving some good examples would help...
Do I have to add anything (like axis=1) when setting the index?
How do I set the index to be the data in a column. (And why does sometimes using ['symbol', 'date'] as a parameter, give me a new column with those two values, instead of setting the index on the existing values of the columns with those two names?)
After I append and assuming the old index is correct do I need to 'update' the index (perhaps using reindex?) or since I told the dataframe that the index is in a certain column, is my data correctly indexed?
And since my dataframes (will) have indices on the same column name, can I do an append of df_aaa on df_all even if df_all was defined to have the columns originally in a different order. (say: ['symbol', 'date', 'data', 'otherdata'] with symbol the first column)?

Upvotes: 1

Answers (3)

pashute

Reputation: 4063

Here is what I gather from the answers and dragging through the docs:

There is a "default index" which is a "row-number" for each row, and which is not part of any of the columns.

When merging with that index, there (seems to be) no need to re-index.

But if I want to change the index after it was made "non-standard" I have to "reset_index()" and turn it back to the default, and then from there I can create the new multi index (as explained in the revisioned answer below)

A multi-index is one that has more than one key (i.e. if indexing the rows, then more than one column will be used).

I'm still not sure if you have to re-index a column after a merge, but according to this it seems you get an automatically generated new "default index" and have to save the old one, remove the index before merge (reset_index) and set it again when done.

The other question about the index replacing a column - I'll check and get back here.

This is a follow-up.

Upvotes: 1

tworec

Reputation: 4747

Just append df's and reset_index() to be able to set_index() with keys argument. Here's oneliner:

df_all = df_all.append(df_aaa).reset_index().set_index(keys=['symbol', 'date'])

And here is full working sample.

In [1]: import pandas as pd
   ...: from io import StringIO
   ...: 

In [2]: df_aaa = pd.read_csv(StringIO("""date     data  otherdata symbol
   ...: 2015/1/1 11    12         aaa
   ...: 2015/2/1 21    22         aaa
   ...: 2015/3/1 31    31         aaa
   ...: """), sep="\s+", index_col='date')
   ...: 

In [3]: df_all = pd.read_csv(StringIO("""date     data  otherdata symbol
   ...: 2015/1/1 31    31         bbb"""), sep="\s+", index_col='date')
   ...: 

In [4]: df_all.append(df_aaa).reset_index().set_index(keys=['symbol', 'date'])
Out[4]: 
                 data  otherdata
symbol date                     
bbb    2015/1/1    31         31
aaa    2015/1/1    11         12
       2015/2/1    21         22
       2015/3/1    31         31

Upvotes: 3

cs95

Reputation: 403130

You can just concatenate them and then set the index.

df_aaa = df_aaa.reset_index()
df_all = df_all.reset_index()

df = df_aaa.append(df_all).set_index(['symbol', 'date'])

Note that this would work only if your dataframes have the same column.s

If you must perform multiple appends in the future, the best thing to do would be to get one of them in the shape of the other, perform the concatenation, and reset index as needed.

I'll answer all your questions one by one.

I'm not sure if a multi-index is an index that has multiple 'columns' (or rows), or is it the ability to have more than one index (and any of them could be for multiple columns or rows). Or are both correct?

It depends on what axis you're referring to. Along the row (0^th axis), you have 2 or more columns forming a MultiIndex. Similarly for along the columns (1^st axis).

Must I first set the index of both dataframes to a multi-index, so the append will work? (otherwise I'll have duplicates for different symbols

No need. Although you could, not doing so would be simpler in this case.

Do I have to "drop" the existing index before creating the new one? No, just that the columns must align (column name and number of columns should be the same).

Is there such a thing as a dataframe with data but no index?

No. All rows are indexed. Even if there is no column as the index, the index is a monotonically increasing number. The model followed here is similar to that in RDBMs.

Must a (single) index be of unique values?

In general, the must, so rows can be uniquely identified. If you have a MultiIndex, each combination of values that make up the index must be unique.

When do I use which of the following dataframe methods: set_index(), reindex(), reset_index(), set_level, reset_level?

This is a broad question. It depends, when do you want to operate on the index and if so, what do you want to do with it? Look at the documentation for each one carefully.

Upvotes: 4

pandas dataframe: Changing from single index to multi-column index

Answers (3)

Related Questions