Naveen Gabriel
Naveen Gabriel

Reputation: 679

Issues with creating column name for DataFrame in Python3

I do not understand why "u" has NaNvalues. What wrong am I doing here?

>>> z=pd.DataFrame([['abcb','asasa'],['sdsd','aeio']])
>>> z
          0      1
    0  abcb  asasa
    1  sdsd   aeio
>>> u=pd.DataFrame(z,columns=['hello','ajsajs'])
>>> u
    hello  ajsajs
 0    NaN     NaN
 1    NaN     NaN 

Upvotes: 1

Views: 114

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40878

Alternate construction calls

You can use the underlying NumPy array:

u = pd.DataFrame(z.values, columns=['hello','ajsajs'])

  hello ajsajs
0  abcb  asasa
1  sdsd   aeio

Alternately, you could use:

u = z.rename(columns={0: 'hello',1: 'ajsajs'})

And lastly as suggested by @Dark:

u = z.set_axis(['hello','ajsajs'], axis=1, inplace=False)

A small note on inplace in set_axis -

WARNING: inplace=None currently falls back to to True, but in a future version, will default to False. Use inplace=True explicitly rather than relying on the default.

In pandas 0.20.3 the syntax would be just:

u = z.set_axis(axis=1, labels=['hello','ajsajs'])

@Dark's solution appears fastest here.

Why current method doesn't work

I believe the issue here is that there's a .reindex being called when the DataFrame is constructed in this way. Here's some source code where ellipses denote irrelevant stuff I'm leaving out:

from pandas.core.internals import BlockManager

# pandas.core.frame.DataFrame
class DataFrame(NDFrame):
    def __init__(self, data=None, index=None, columns=None, dtype=None,
                 copy=False):
        # ...
        if isinstance(data, DataFrame):
            data = data._data
        if isinstance(data, BlockManager):
            mgr = self._init_mgr(data, axes=dict(index=index, columns=columns),
                                 dtype=dtype, copy=copy)
        # ... a bunch of other if statements irrelevant to your case
        NDFrame.__init__(self, mgr, fastpath=True)
        # ...

What's happening here:

  • DataFrame inherits from a more generic base class which in turn has multiple inheritance. (Pandas is great, but its source can be like trying to backtrack through a spider's web.)
  • In u = pd.DataFrame(z,columns=['hello','ajsajs']), x is a DataFrame. Therefore, the first if statement below is True and data = data._data. What's _data? It's a BlockManager.* (To be continued below...)
  • Because we just converted what you passed to its BlockManager, the next if statement also evaluates True. Then mgr gets assigned to the result of the _init_mrg method and the parent class's __init__ gets called, passing mgr.

* confirm with isinstance(z._data, BlockManager).

Now on to part 2...

# pandas.core.generic.NDFrame
class NDFrame(PandasObject, SelectionMixin):
    def __init__(self, data, axes=None, copy=False, dtype=None,
             fastpath=False):
    # ...

    def _init_mgr(self, mgr, axes=None, dtype=None, copy=False):
        """ passed a manager and a axes dict """
        for a, axe in axes.items():
            if axe is not None:
                mgr = mgr.reindex_axis(axe,
                                       axis=self._get_block_manager_axis(a),
                                       copy=False)
    # ...
        return mgr

Here is where _init_mgr is defined, which gets called above. Essentially in your case you have:

columns=['hello','ajsajs']
axes=dict(index=None, columns=columns)
# ...

When you go to reindex axis and specify a new axis where none of the new labels are included in the old object, you get all NaNs. This seems like a deliberate design decision. Consider this related example to prove the point, where one new column is present and one is not:

pd.DataFrame(z, columns=[0, 'ajsajs'])

      0  ajsajs
0  abcb     NaN
1  sdsd     NaN

Upvotes: 3

Related Questions