pashute
pashute

Reputation: 4063

pandas dataframe: Changing from single index to multi-column index

In python pandas I have a dataframe

df_aaa:

date     data  otherdata symbol
2015/1/1 11    12         aaa
2015/2/1 21    22         aaa
2015/3/1 31    31         aaa

df_all:

2015/1/1 31    31         bbb

Currently the index of both is by date .

I want to append df_aaa to df_all, and have them with a composite index of both symbol and date.

  1. How do I do that?

Basically the following are all one question: How do I set a multi-index and use it when appending. Can I do it with different column order? Do I need to refresh? Etc.:

Upvotes: 1

Views: 4682

Answers (3)

pashute
pashute

Reputation: 4063

Here is what I gather from the answers and dragging through the docs:

There is a "default index" which is a "row-number" for each row, and which is not part of any of the columns.

When merging with that index, there (seems to be) no need to re-index.

But if I want to change the index after it was made "non-standard" I have to "reset_index()" and turn it back to the default, and then from there I can create the new multi index (as explained in the revisioned answer below)

A multi-index is one that has more than one key (i.e. if indexing the rows, then more than one column will be used).

I'm still not sure if you have to re-index a column after a merge, but according to this it seems you get an automatically generated new "default index" and have to save the old one, remove the index before merge (reset_index) and set it again when done.

The other question about the index replacing a column - I'll check and get back here.

This is a follow-up.

Upvotes: 1

tworec
tworec

Reputation: 4747

Just append df's and reset_index() to be able to set_index() with keys argument. Here's oneliner:

df_all = df_all.append(df_aaa).reset_index().set_index(keys=['symbol', 'date'])

And here is full working sample.

In [1]: import pandas as pd
   ...: from io import StringIO
   ...: 

In [2]: df_aaa = pd.read_csv(StringIO("""date     data  otherdata symbol
   ...: 2015/1/1 11    12         aaa
   ...: 2015/2/1 21    22         aaa
   ...: 2015/3/1 31    31         aaa
   ...: """), sep="\s+", index_col='date')
   ...: 

In [3]: df_all = pd.read_csv(StringIO("""date     data  otherdata symbol
   ...: 2015/1/1 31    31         bbb"""), sep="\s+", index_col='date')
   ...: 

In [4]: df_all.append(df_aaa).reset_index().set_index(keys=['symbol', 'date'])
Out[4]: 
                 data  otherdata
symbol date                     
bbb    2015/1/1    31         31
aaa    2015/1/1    11         12
       2015/2/1    21         22
       2015/3/1    31         31

Upvotes: 3

cs95
cs95

Reputation: 403130

You can just concatenate them and then set the index.

df_aaa = df_aaa.reset_index()
df_all = df_all.reset_index()

df = df_aaa.append(df_all).set_index(['symbol', 'date'])

Note that this would work only if your dataframes have the same column.s

If you must perform multiple appends in the future, the best thing to do would be to get one of them in the shape of the other, perform the concatenation, and reset index as needed.


I'll answer all your questions one by one.

I'm not sure if a multi-index is an index that has multiple 'columns' (or rows), or is it the ability to have more than one index (and any of them could be for multiple columns or rows). Or are both correct?

It depends on what axis you're referring to. Along the row (0th axis), you have 2 or more columns forming a MultiIndex. Similarly for along the columns (1st axis).

Must I first set the index of both dataframes to a multi-index, so the append will work? (otherwise I'll have duplicates for different symbols

No need. Although you could, not doing so would be simpler in this case.

Do I have to "drop" the existing index before creating the new one? No, just that the columns must align (column name and number of columns should be the same).

Is there such a thing as a dataframe with data but no index?

No. All rows are indexed. Even if there is no column as the index, the index is a monotonically increasing number. The model followed here is similar to that in RDBMs.

Must a (single) index be of unique values?

In general, the must, so rows can be uniquely identified. If you have a MultiIndex, each combination of values that make up the index must be unique.

When do I use which of the following dataframe methods: set_index(), reindex(), reset_index(), set_level, reset_level?

This is a broad question. It depends, when do you want to operate on the index and if so, what do you want to do with it? Look at the documentation for each one carefully.

Upvotes: 4

Related Questions