Yuan Ren
Yuan Ren

Reputation: 295

adding prefix to pandas column

I'm trying to add a prefix to a DataFrame in pandas. It supposes to be very easy:

import pandas as pd
a=pd.DataFrame({
    'x':[1,2,3],
})
#this one works;
"mm"+a['x'].astype(str)
0    mm1
1    mm2
2    mm3
Name: x, dtype: object

But surprisingly, if I want to use a prefix of single letter 'm', it stops working:

#this one doesn't work
"m"+a['x'].astype(str)
TypeError                                 Traceback (most recent call last)
<ipython-input-21-808db8051ebc> in <module>
      1 #this one doesn't work
----> 2 "m"+a['x'].astype(str)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
   1014             if is_scalar(right):
   1015                 # broadcast and wrap in a TimedeltaIndex
-> 1016                 assert np.isnat(right)
   1017                 right = np.broadcast_to(right, left.shape)
   1018                 right = pd.TimedeltaIndex(right)

TypeError: ufunc 'isnat' is only defined for datetime and timedelta.

So my questions are:

  1. How to solve the problem?

  2. What happened, it seems pandas is trying to do something fancy?

  3. Why is 'm' is so special? (it seems other single letters are ok, e.g. 'b').

Upvotes: 4

Views: 223

Answers (3)

milos.ai
milos.ai

Reputation: 3930

Solve the problem by changing to:

import numpy as np
np.array('m')+a['x'].astype(str)

For some reason pandas thinks this "m" marks time. Please check explanation from @Daniel Mesejo

Upvotes: 1

Dani Mesejo
Dani Mesejo

Reputation: 61920

The problem is that "m" is interpreted as TimeDelta:

from pandas.core.dtypes.common import is_timedelta64_dtype

print(is_timedelta64_dtype("m"))

Output

True

The function is_timedelta64_dtype is called when you do:

res = "m" + a['x'].astype(str)

Code (pandas)

elif is_timedelta64_dtype(right):
    # We should only get here with non-scalar or timedelta64('NaT')
    #  values for right
    # Note: we cannot use dispatch_to_index_op because
    #  that may incorrectly raise TypeError when we
    #  should get NullFrequencyError
    orig_right = right
    if is_scalar(right):
        # broadcast and wrap in a TimedeltaIndex
        assert np.isnat(right)
        right = np.broadcast_to(right, left.shape)
        right = pd.TimedeltaIndex(right)

Given that the value is an scalar also, it checks if it is NaT,

assert np.isnat(right)

Which is what triggers the Exception. A simple workaround is to put "m" inside a list:

res = ["m"] + a['x'].astype(str)
print(res)

Output

0    m1
1    m2
2    m3
Name: x, dtype: object

Upvotes: 3

Karthick Mohanraj
Karthick Mohanraj

Reputation: 1658

Well this seems to be an issue with the python frontend interface. This might occur due to some conflicts while using a Spyder interface or a Jupyter notebook. I got the same error while running the code on Spyder. The issue got resolved when I used the same code by invoking python in the command line terminal instead of SPYDER OR Jupyter.

Try running this same code in the command line terminal by invoking the python command and it should work perfectly.

Upvotes: 0

Related Questions