Reputation: 127
My dataframe looks like this:
And I need to drop the first 4 rows because they have NaN as a value in the first column. Since I'll have to do this to slightly different dataframes I can't just drop them by index.
To achieve this I thought of iterating over the df by rows, checking if the value is NaN using numpy
isnan
function and then drop the row - sadly it doesn't seem to work.
first_col = df.columns[0]
for i, row in df.iterrows():
if np.isnan(row[first_col]):
df.drop(i, axis=0, inplace=True)
else:
break
isnan
does not work though.
So I tried replacing NaN values with a blank string df.fillna("", inplace=True)
and replaced the if condition:
first_col = df.columns[0]
for i, row in df.iterrows():
if row[first_col] == '':
df.drop(i, inplace=True, axis=0)
else:
break
This works, but it's pretty ugly alright. Is there a faster/neater way to achieve this?
Upvotes: 1
Views: 2252
Reputation: 196
You may try this:
df['num.ord.tariffa'] = df['num.ord.tariffa'].fillna('Remove')
newdf = df[df['num.ord.tariffa'] != 'Remove']
EDIT:
final = pd.DataFrame()
n = 4
for index,row in df.iterrows():
if index < n:
if row['c1'] == np.nan:
pass
else:
new = pd.DataFrame([[row['c1'],row['c2']]],columns=['c1','c2'])
final = final.append(new)
else:
new = pd.DataFrame([[row['c1'],row['c2']]],columns=['c1','c2'])
final = final.append(new)
Upvotes: 1
Reputation: 13821
I can't replicate your full dataset because of the way you posted it but you can do this:
Assume a df
(which is similar to your first column):
num.ord.tariffa
0 NaN
1 NaN
2 NaN
3 NaN
4 5
5 6
6 7
You use .loc
, and argmax()
:
new_df = df.loc[df.notnull().all(axis=1).argmax():]
and get back:
num.ord.tariffa
4 5
5 6
6 7
Which removes np.nan
until the first non-nan
, which is your desired result.
Upvotes: 1
Reputation: 21
You should drop rows with Nan values and add a subset of the columns you are interested:
df = df.dropna(subset='num.ord.tariffa')
Upvotes: 0