Reputation: 11
I'm trying to learn Python and I'm working on a project. I want to split a column. Column like this; income 60-80 120- 0-40 Here is my code: For def["min_income] line, I get invalid literal for int() with base error, for the other line (max_income) I receive list index out of range error.
income = df["Income"]
income = income.replace({"Unknown": ""})
df["min_income"] = income.apply(lambda x: int(x.split("-")[0]))
df["max_income"] = income.apply(lambda x: x.split("-")[1])
But the outcome give an error like this:
df["min_income"] = income.apply(lambda x: int(x.split("-")[0]))
Traceback (most recent call last):
File "<ipython-input-69-9be6a45724ad>", line 1, in <module>
df["min_income"] = income.apply(lambda x: int(x.split("-")[0]))
File "C:\Users\memin\anaconda3\lib\site-packages\pandas\core\series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "<ipython-input-69-9be6a45724ad>", line 1, in <lambda>
df["min_income"] = income.apply(lambda x: int(x.split("-")[0]))
ValueError: invalid literal for int() with base 10: ''
I want to split the income column into two different parts(columns)-min_income and max_income- as integer form. I check the error in the internet but I could not fix the problem. How can I solve this problem? Also I tired .astype(int) func.
Upvotes: 1
Views: 985
Reputation: 195418
If you have this dataframe:
income
0 60-80
1 0-40
2 120-
3 80-120
4 -255
Then:
df[["min_income", "max_income"]] = df["income"].str.split("-", expand=True)
print(df)
Will create two columns "min_income"
and "max_income"
:
income min_income max_income
0 60-80 60 80
1 0-40 0 40
2 120- 120
3 80-120 80 120
4 -255 255
You then can fill the blank values as you wish (and then convert to numeric format).
Upvotes: 0