Abhishek
Abhishek

Reputation: 47

getting error when trying to convert object column into int

I have a dataframe in which few of the columns are object, and I want to change one of them into a int column so I can work with it. and do some calculation. but when ever am trying to do it am getting this error.

here's my code.

code which giving me the error.

df['Amount in USD']=df['Amount in USD'].str.replace(',', '') #this worked fine

df['Amount in USD']=df['Amount in USD'].astype(int) #but this doesn't

error

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-b9d8d4e75b08> in <module>
----> 1 df['Amount in USD']=df['Amount in USD'].astype(int)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5870         else:
   5871             # else, only a single dtype is given
-> 5872             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5873             return self._constructor(new_data).__finalize__(self, method="astype")
   5874 

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    629         self, dtype, copy: bool = False, errors: str = "raise"
    630     ) -> "BlockManager":
--> 631         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    632 
    633     def convert(

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    425                     applied = b.apply(f, **kwargs)
    426                 else:
--> 427                     applied = getattr(b, f)(**kwargs)
    428             except (TypeError, NotImplementedError):
    429                 if not ignore_failures:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    671             vals1d = values.ravel()
    672             try:
--> 673                 values = astype_nansafe(vals1d, dtype, copy=True)
    674             except (ValueError, TypeError):
    675                 # e.g. astype_nansafe can fail on object-dtype of strings

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
   1072         # work around NumPy brokenness, #1987
   1073         if np.issubdtype(dtype.type, np.integer):
-> 1074             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
   1075 
   1076         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: 'undisclosed'

info about the data frame.

0   Sr No              3044 non-null   int64 
 1   Date dd/mm/yyyy    3044 non-null   object
 2   Startup Name       3044 non-null   object
 3   Industry Vertical  2873 non-null   object
 4   SubVertical        2108 non-null   object
 5   City  Location     2864 non-null   object
 6   Investors Name     3020 non-null   object
 7   InvestmentnType    3040 non-null   object
 8   Amount in USD      2084 non-null   object
 9   Remarks            419 non-null    object

here the sample of my data frame

Sr No   Date dd/mm/yyyy Startup Name    Industry Vertical   SubVertical City Location   Investors Name  InvestmentnType Amount in USD   Remarks
0   1   09/01/2020  BYJU’S  E-Tech  E-learning  Bengaluru   Tiger Global Management Private Equity Round    20,00,00,000    NaN
1   2   13/01/2020  Shuttl  Transportation  App based shuttle service   Gurgaon Susquehanna Growth Equity   Series C    80,48,394   NaN
2   3   09/01/2020  Mamaearth   E-commerce  Retailer of baby and toddler products   Bengaluru   Sequoia Capital India   Series B    1,83,58,860 NaN
3   4   02/01/2020  https://www.wealthbucket.in/    FinTech Online Investment   New Delhi   Vinod Khatumal  Pre-series A    30,00,000   NaN

Upvotes: 1

Views: 882

Answers (1)

tafaust
tafaust

Reputation: 1518

There is a categorical variable instance 'undisclosed' in your df['Amount in USD'] which cannot be converted to int per se.

You need to map values that are not numeric with string type on your own, i.e.:

df['Amount in USD'] = df['Amount in USD'].replace('undisclosed', '-1')
df['Amount in USD'] = df['Amount in USD'].astype(int)

I make the assumption here, that there is no '-1' values in your df['Amount in USD'] column. You can check the unique values for that column like so:

`df['Amount in USD']`.unique()

Feel free to add those contents to your question so I can assist you further.


EDIT Bonus:

Depending on what calculations you want to perform on that column you need to carefully select the integers. There are several good guides available online:

Make sure that it also fits your domain which does look like finance to me.

Upvotes: 1

Related Questions