Reputation: 47
I have a dataframe in which few of the columns are object, and I want to change one of them into a int column so I can work with it. and do some calculation. but when ever am trying to do it am getting this error.
here's my code.
code which giving me the error.
df['Amount in USD']=df['Amount in USD'].str.replace(',', '') #this worked fine
df['Amount in USD']=df['Amount in USD'].astype(int) #but this doesn't
error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-b9d8d4e75b08> in <module>
----> 1 df['Amount in USD']=df['Amount in USD'].astype(int)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
5870 else:
5871 # else, only a single dtype is given
-> 5872 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5873 return self._constructor(new_data).__finalize__(self, method="astype")
5874
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
629 self, dtype, copy: bool = False, errors: str = "raise"
630 ) -> "BlockManager":
--> 631 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
632
633 def convert(
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
425 applied = b.apply(f, **kwargs)
426 else:
--> 427 applied = getattr(b, f)(**kwargs)
428 except (TypeError, NotImplementedError):
429 if not ignore_failures:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
671 vals1d = values.ravel()
672 try:
--> 673 values = astype_nansafe(vals1d, dtype, copy=True)
674 except (ValueError, TypeError):
675 # e.g. astype_nansafe can fail on object-dtype of strings
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
1072 # work around NumPy brokenness, #1987
1073 if np.issubdtype(dtype.type, np.integer):
-> 1074 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
1075
1076 # if we have a datetime/timedelta array of objects
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: 'undisclosed'
info about the data frame.
0 Sr No 3044 non-null int64
1 Date dd/mm/yyyy 3044 non-null object
2 Startup Name 3044 non-null object
3 Industry Vertical 2873 non-null object
4 SubVertical 2108 non-null object
5 City Location 2864 non-null object
6 Investors Name 3020 non-null object
7 InvestmentnType 3040 non-null object
8 Amount in USD 2084 non-null object
9 Remarks 419 non-null object
here the sample of my data frame
Sr No Date dd/mm/yyyy Startup Name Industry Vertical SubVertical City Location Investors Name InvestmentnType Amount in USD Remarks
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru Tiger Global Management Private Equity Round 20,00,00,000 NaN
1 2 13/01/2020 Shuttl Transportation App based shuttle service Gurgaon Susquehanna Growth Equity Series C 80,48,394 NaN
2 3 09/01/2020 Mamaearth E-commerce Retailer of baby and toddler products Bengaluru Sequoia Capital India Series B 1,83,58,860 NaN
3 4 02/01/2020 https://www.wealthbucket.in/ FinTech Online Investment New Delhi Vinod Khatumal Pre-series A 30,00,000 NaN
Upvotes: 1
Views: 882
Reputation: 1518
There is a categorical variable instance 'undisclosed'
in your df['Amount in USD']
which cannot be converted to int
per se.
You need to map values that are not numeric with string type on your own, i.e.:
df['Amount in USD'] = df['Amount in USD'].replace('undisclosed', '-1')
df['Amount in USD'] = df['Amount in USD'].astype(int)
I make the assumption here, that there is no '-1'
values in your df['Amount in USD']
column. You can check the unique values for that column like so:
`df['Amount in USD']`.unique()
Feel free to add those contents to your question so I can assist you further.
EDIT Bonus:
Depending on what calculations you want to perform on that column you need to carefully select the integers. There are several good guides available online:
Make sure that it also fits your domain which does look like finance to me.
Upvotes: 1