to divide the rows with null values in a dataframe in another dataframe

Question

I want to convert the dataframe having null values into my test set so i can train the data with no null values and predict the null values using a regression model.

for i in df1:
    if (df1['dependents'].iloc[i].notnull())==False:
        test[i]=df1[i]

so far i tried this code but this showing an error.

TypeError                                 Traceback (most recent call last)
 in 
      1 for i in df1:
----> 2     if (df1['dependents'].iloc[i].notnull())==False:
      3         test[i]=df1[i]

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1765 
   1766             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1767             return self._getitem_axis(maybe_callable, axis=axis)
   1768 
   1769     def _is_scalar_access(self, key: Tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   2132             key = item_from_zerodim(key)
   2133             if not is_integer(key):
-> 2134                 raise TypeError("Cannot index by location index with a non-integer key")
   2135 
   2136             # validate the location

TypeError: Cannot index by location index with a non-integer key

anjsimmo · Accepted Answer

for i in df1 will iterate over the column names rather than the rows. To iterate over the rows, you need to use iterrows() or iteritems(), as explained in this answer:

import pandas as pd
from numpy import nan

# example data
df1 = pd.DataFrame(
    {'age':        [ 30,  16,  40,  40,  30],
     'gender':     ['M', 'F', 'X', 'M', 'F'],
     'dependents': [  2,   0,   2, nan,   3]})

# will hold the non-null rows
train = []
# will hold the null rows
test = []

# use iterrows to loop over rows in the dataframe
for i, row in df1.iterrows():
    if pd.isnull(df1['dependents'].iloc[i]):
        test.append(row)
    else:
        train.append(row)

# build dataframe from rows
train_df = pd.DataFrame(train)
test_df  = pd.DataFrame(test)

However, it's usually not necessary to iterate over rows like this at all. There's a much more efficient way:

train_df = df1[~pd.isnull(df1['dependents'])]
test_df  = df1[pd.isnull(df1['dependents'])]

to divide the rows with null values in a dataframe in another dataframe

Answers (2)

Related Questions