Reputation: 5097
I have a dataframe df
which looks like:
id location grain
0 BBG.XETR.AD.S XETR 16.545
1 BBG.XLON.VB.S XLON 6.2154
2 BBG.XLON.HF.S XLON NaN
3 BBG.XLON.RE.S XLON NaN
4 BBG.XLON.LL.S XLON NaN
5 BBG.XLON.AN.S XLON 3.215
6 BBG.XLON.TR.S XLON NaN
7 BBG.XLON.VO.S XLON NaN
In reality this dataframe will be much larger. I would like to iterate over this dataframe returning the 'grain'
value but I am only interested in the rows that have a value (not NaN) in the 'grain' column. So only returning as I iterate over the dataframe the following values:
16.545
6.2154
3.215
I can iterate over the dataframe using:
for staticidx, row in df.iterrows():
value= row['grain']
But this returns a value for all rows including those with a NaN value. Is there a way to either remove the NaN rows from the dataframe or skip the rows in the dataframe where grain equals NaN?
Many thanks
Upvotes: 1
Views: 646
Reputation: 210842
Let's compare different methods (for 800K rows DF):
In [21]: df = pd.concat([df] * 10**5, ignore_index=True)
In [22]: df.shape
Out[22]: (800000, 3)
In [23]: %timeit df.grain[~pd.isnull(df.grain)]
The slowest run took 5.33 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 17.1 ms per loop
In [24]: %timeit df.ix[df.grain.notnull(), 'grain']
10 loops, best of 3: 23.9 ms per loop
In [25]: %timeit df[pd.notnull(df['grain'])]
10 loops, best of 3: 35.9 ms per loop
In [26]: %timeit df.grain.ix[df.grain.notnull()]
100 loops, best of 3: 17.4 ms per loop
In [27]: %timeit df.dropna(subset=['grain'])
10 loops, best of 3: 56.6 ms per loop
In [28]: %timeit df.grain[df.grain.notnull()]
100 loops, best of 3: 17 ms per loop
In [30]: %timeit df['grain'].dropna()
100 loops, best of 3: 16.3 ms per loop
Upvotes: 0
Reputation: 109546
You can specify a list of columns in dropna
on which to subset the data:
subset : array-like Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include
>>> df.dropna(subset=['grain'])
id location grain
0 BBG.XETR.AD.S XETR 16.5450
1 BBG.XLON.VB.S XLON 6.2154
5 BBG.XLON.AN.S XLON 3.2150
Upvotes: 1