Reputation: 679
I do not understand why "u" has NaNvalues. What wrong am I doing here?
>>> z=pd.DataFrame([['abcb','asasa'],['sdsd','aeio']])
>>> z
0 1
0 abcb asasa
1 sdsd aeio
>>> u=pd.DataFrame(z,columns=['hello','ajsajs'])
>>> u
hello ajsajs
0 NaN NaN
1 NaN NaN
Upvotes: 1
Views: 114
Reputation: 40878
You can use the underlying NumPy array:
u = pd.DataFrame(z.values, columns=['hello','ajsajs'])
hello ajsajs
0 abcb asasa
1 sdsd aeio
Alternately, you could use:
u = z.rename(columns={0: 'hello',1: 'ajsajs'})
And lastly as suggested by @Dark:
u = z.set_axis(['hello','ajsajs'], axis=1, inplace=False)
A small note on inplace
in set_axis
-
WARNING:
inplace=None
currently falls back to toTrue
, but in a future version, will default toFalse
. Useinplace=True
explicitly rather than relying on the default.
In pandas 0.20.3 the syntax would be just:
u = z.set_axis(axis=1, labels=['hello','ajsajs'])
@Dark's solution appears fastest here.
I believe the issue here is that there's a .reindex
being called when the DataFrame is constructed in this way. Here's some source code where ellipses denote irrelevant stuff I'm leaving out:
from pandas.core.internals import BlockManager
# pandas.core.frame.DataFrame
class DataFrame(NDFrame):
def __init__(self, data=None, index=None, columns=None, dtype=None,
copy=False):
# ...
if isinstance(data, DataFrame):
data = data._data
if isinstance(data, BlockManager):
mgr = self._init_mgr(data, axes=dict(index=index, columns=columns),
dtype=dtype, copy=copy)
# ... a bunch of other if statements irrelevant to your case
NDFrame.__init__(self, mgr, fastpath=True)
# ...
What's happening here:
u = pd.DataFrame(z,columns=['hello','ajsajs'])
, x
is a DataFrame. Therefore, the first if
statement below is True and data = data._data
. What's _data
? It's a BlockManager
.* (To be continued below...)if
statement also evaluates True. Then mgr
gets assigned to the result of the _init_mrg
method and the parent class's __init__
gets called, passing mgr
.* confirm with isinstance(z._data, BlockManager)
.
Now on to part 2...
# pandas.core.generic.NDFrame
class NDFrame(PandasObject, SelectionMixin):
def __init__(self, data, axes=None, copy=False, dtype=None,
fastpath=False):
# ...
def _init_mgr(self, mgr, axes=None, dtype=None, copy=False):
""" passed a manager and a axes dict """
for a, axe in axes.items():
if axe is not None:
mgr = mgr.reindex_axis(axe,
axis=self._get_block_manager_axis(a),
copy=False)
# ...
return mgr
Here is where _init_mgr
is defined, which gets called above. Essentially in your case you have:
columns=['hello','ajsajs']
axes=dict(index=None, columns=columns)
# ...
When you go to reindex axis and specify a new axis where none of the new labels are included in the old object, you get all NaNs. This seems like a deliberate design decision. Consider this related example to prove the point, where one new column is present and one is not:
pd.DataFrame(z, columns=[0, 'ajsajs'])
0 ajsajs
0 abcb NaN
1 sdsd NaN
Upvotes: 3