Reputation: 3
I want to convert the dataframe having null values into my test set so i can train the data with no null values and predict the null values using a regression model.
for i in df1:
if (df1['dependents'].iloc[i].notnull())==False:
test[i]=df1[i]
so far i tried this code but this showing an error.
TypeError Traceback (most recent call last)
<ipython-input-13-975c8029ee0e> in <module>
1 for i in df1:
----> 2 if (df1['dependents'].iloc[i].notnull())==False:
3 test[i]=df1[i]
~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1765
1766 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1767 return self._getitem_axis(maybe_callable, axis=axis)
1768
1769 def _is_scalar_access(self, key: Tuple):
~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
2132 key = item_from_zerodim(key)
2133 if not is_integer(key):
-> 2134 raise TypeError("Cannot index by location index with a non-integer key")
2135
2136 # validate the location
TypeError: Cannot index by location index with a non-integer key
Upvotes: 0
Views: 1191
Reputation: 144
Following Code will allow u split Null values into different Data-frame:
test = df1[df1['dependents'].isnull()]
Upvotes: 1
Reputation: 714
for i in df1
will iterate over the column names rather than the rows. To iterate over the rows, you need to use iterrows()
or iteritems()
, as explained in this answer:
import pandas as pd
from numpy import nan
# example data
df1 = pd.DataFrame(
{'age': [ 30, 16, 40, 40, 30],
'gender': ['M', 'F', 'X', 'M', 'F'],
'dependents': [ 2, 0, 2, nan, 3]})
# will hold the non-null rows
train = []
# will hold the null rows
test = []
# use iterrows to loop over rows in the dataframe
for i, row in df1.iterrows():
if pd.isnull(df1['dependents'].iloc[i]):
test.append(row)
else:
train.append(row)
# build dataframe from rows
train_df = pd.DataFrame(train)
test_df = pd.DataFrame(test)
However, it's usually not necessary to iterate over rows like this at all. There's a much more efficient way:
train_df = df1[~pd.isnull(df1['dependents'])]
test_df = df1[pd.isnull(df1['dependents'])]
Upvotes: 0