Tony
Tony

Reputation: 241

DataFrame Index Created From Columns

I have a dataframe that I am using TIA to populate data from Bloomberg. When I look at df.index I see that the data that I intended to be columns is presented to me as what appears to be a multi-index. The output for df.columns is like this:

Index([u'column1','u'column2'])

I have tried various iterations of reset_index but have not been able to remedy this situation.

1) what about the TIA manager causes the dataframe columns to be read in as an index?

2) How can I properly identify these columns as columns instead of a multi-index?

The ultimate problem that I'm trying to fix is that when I try to add this column to df2, the values for that column in df2 come out as NaT. Like below:

df2['column3'] = df1['column1']

Produces:

df2
column1 column2 column3
1135     32       NaT
1351      43      NaT
35        13      NaT
135       13      NaT

Upvotes: 0

Views: 93

Answers (1)

unutbu
unutbu

Reputation: 879143

From the comments it appears df1 and df2 have completely different indexes

In [396]: df1.index
Out[400]: Index(['Jan', 'Feb', 'Mar', 'Apr', 'May'], dtype='object')

In [401]: df2.index
Out[401]: Index(['One', 'Two', 'Three', 'Four', 'Five'], dtype='object')

but we wish to assign values from df1 to df2, preserving order.

Usually, Pandas operations try to automatically align values based on index (and/or column) labels. In this case, we wish to ignore the labels. To do that, use

df2['columns3'] = df1['column1'].values

df1['column1'].values is a NumPy array. Since it doesn't have a Index, Pandas simply assigns the values in the array into df2['columns3'] in order. The assignment would behave the same way if the right-hand side were a list or a tuple. Note that this also relies on len(df1) equaling len(df2).


For example,

import pandas as pd

df1 = pd.DataFrame(
    {"column1": [1135, 1351, 35, 135, 0], "column2": [32, 43, 13, 13, 0]},
    index=[u"Jan", u"Feb", u"Mar", u"Apr", u"May"],
)
df2 = pd.DataFrame(
    {"column1": range(len(df1))}, index=[u"One", u"Two", u"Three", u"Four", u"Five"]
)
df2["columns3"] = df1["column1"].values
print(df2)

yields

       column1  columns3
One          0      1135
Two          1      1351
Three        2        35
Four         3       135
Five         4         0

Alternatively, you could make the two Indexs the same, and then df2["columns3"] = df1["column1"] would produce the same result (but now because the index labels are being aligned):

df1.index = df2.index
df2["columns3"] = df1["column1"]

Another way to make the Indexs match, is to reset the index on both DataFrames:

df1 = df1.reset_index()
df2 = df2.reset_index()
df2["columns3"] = df1["column1"]

reset_index moves the old index into a column named index by default (if index.name was None). Integers (starting with 0) are assigned as the new index labels:

In [402]: df1.reset_index()
Out[410]: 
  index  column1  column2
0   Jan     1135       32
1   Feb     1351       43
2   Mar       35       13
3   Apr      135       13
4   May        0        0

Upvotes: 1

Related Questions