gccallie
gccallie

Reputation: 127

How to drop the first n rows that has a NaN value in the first column?

My dataframe looks like this: enter image description here And I need to drop the first 4 rows because they have NaN as a value in the first column. Since I'll have to do this to slightly different dataframes I can't just drop them by index. To achieve this I thought of iterating over the df by rows, checking if the value is NaN using numpy isnan function and then drop the row - sadly it doesn't seem to work.

first_col = df.columns[0]
for i, row in df.iterrows():
    if np.isnan(row[first_col]):
        df.drop(i, axis=0, inplace=True)
    else:
        break

isnan does not work though. So I tried replacing NaN values with a blank string df.fillna("", inplace=True) and replaced the if condition:

first_col = df.columns[0]
for i, row in df.iterrows():
    if row[first_col] == '':
        df.drop(i, inplace=True, axis=0)
    else:
        break

This works, but it's pretty ugly alright. Is there a faster/neater way to achieve this?

Upvotes: 1

Views: 2252

Answers (3)

Prateek Jain
Prateek Jain

Reputation: 196

You may try this:

df['num.ord.tariffa'] = df['num.ord.tariffa'].fillna('Remove')
newdf = df[df['num.ord.tariffa'] != 'Remove']

EDIT:

final = pd.DataFrame()
n = 4
for index,row in df.iterrows():
   if index < n:
        if row['c1'] == np.nan:
           pass
        else:
           new = pd.DataFrame([[row['c1'],row['c2']]],columns=['c1','c2'])
           final = final.append(new)
   else:
        new = pd.DataFrame([[row['c1'],row['c2']]],columns=['c1','c2'])
        final = final.append(new)

Upvotes: 1

sophocles
sophocles

Reputation: 13821

I can't replicate your full dataset because of the way you posted it but you can do this:

Assume a df (which is similar to your first column):

  num.ord.tariffa
0             NaN
1             NaN
2             NaN
3             NaN
4               5
5               6
6               7

You use .loc, and argmax():

new_df = df.loc[df.notnull().all(axis=1).argmax():]

and get back:

  num.ord.tariffa
4               5
5               6
6               7

Which removes np.nan until the first non-nan, which is your desired result.

Upvotes: 1

You should drop rows with Nan values and add a subset of the columns you are interested:

df = df.dropna(subset='num.ord.tariffa')

Upvotes: 0

Related Questions