theprowler
theprowler

Reputation: 3600

Pandas - remove every NaN from dataframe

I have a dataframe with NaNs scattered throughout it and would like to remove them all so I'm just left with my data.

Here is a printout of my dataframe fish_frame:

fish_frame:                         0       1       2         3  \
0                   735-8     NaN     NaN       NaN   
1                     NaN     NaN     NaN  LIVE WGT   
2                 GBE COD     NaN     NaN       600   
3                 GBW COD     NaN  11,189       NaN   
4                 GOM COD     NaN       0       NaN   
5                 POLLOCK     NaN     NaN     1,103   
6                   WHAKE     NaN     NaN        12   
7             GBE HADDOCK     NaN  10,730       NaN   
8             GBW HADDOCK     NaN  64,147       NaN   
9             GOM HADDOCK     NaN       0       NaN   
10                REDFISH     NaN     NaN         0   
11         WITCH FLOUNDER     NaN     370       NaN   
12                 PLAICE     NaN     NaN       622   
13     GB WINTER FLOUNDER  54,315     NaN       NaN   
14    GOM WINTER FLOUNDER     653     NaN       NaN   
15  SNEMA WINTER FLOUNDER  14,601     NaN       NaN   
16          GB YELLOWTAIL     NaN   1,663       NaN   
17       SNEMA YELLOWTAIL     NaN   1,370       NaN   
18       CCGOM YELLOWTAIL   1,812     NaN       NaN   

                            4      6        package_deal_column Package_Price  
0                         NaN    NaN  Package Deal - $40,753.69           nan  
1                         NaN  TOTAL  Package Deal - $40,753.69           nan  
2                         NaN    NaN  Package Deal - $40,753.69          None  
3                         NaN    NaN  Package Deal - $40,753.69          None  
4   Package Deal - $40,753.69   None  Package Deal - $40,753.69          None  
5                         NaN    NaN  Package Deal - $40,753.69          None  
6                         NaN    NaN  Package Deal - $40,753.69          None  
7                         NaN    NaN  Package Deal - $40,753.69          None  
8                         NaN    NaN  Package Deal - $40,753.69          None  
9                         NaN    NaN  Package Deal - $40,753.69          None  
10                        NaN    NaN  Package Deal - $40,753.69          None  
11                        NaN    NaN  Package Deal - $40,753.69          None  
12                        NaN    NaN  Package Deal - $40,753.69          None  
13                        NaN   None  Package Deal - $40,753.69          None  
14                        NaN   None  Package Deal - $40,753.69          None  
15                        NaN   None  Package Deal - $40,753.69          None  
16                        NaN    NaN  Package Deal - $40,753.69          None  
17                        NaN    NaN  Package Deal - $40,753.69          None  
18                        NaN   None  Package Deal - $40,753.69          None

Ideally, for columns 2, 3, and 4 I would like there to be no NaNs and for the numbers to all be in one column instead of three.

I used the command fish_frame = fish_frame.dropna(axis=1, how='all') but it didn't alter the dataframe at all.

Any help would be appreciated thanks.

Upvotes: 2

Views: 8384

Answers (3)

seralouk
seralouk

Reputation: 33137

You need to use this:

df = pd.read_csv('fish.csv',header = None)

df_new = df.convert_objects(convert_numeric=True)

df_new = df_new.fillna(value=0)

This will replace all the NaN and strings with 0.

Then you can add the 3 columns and get 1 columns with all the numbers as you said.

df_total = df_new[1] + df_new[2] + df_new[3]

This approach would only work in your case since you do not have values at the same lines of the same columns.

Upvotes: 1

Corley Brigman
Corley Brigman

Reputation: 12371

Doing the add could be something like

 df['TotalVal'] = df[[0,1,2]].fillna(0).T.sum()

or

 df['TotalVal'] = df[[0,1,2]].fillna(0).sum(axis=1)

Though... based on your questions, and the particular ways the data is 'weird', I might actually suggest you do the cleaning before this goes into pandas - it would be a lot easier to do I think, using the csv module to create a set of tuples, and build the dataframe from there.

I'm not an expert, and it took me a while to wrap my head around the pandas paradigms, but you have to think functionally (think results, not process) and consider shapes of data.

In this case, we want a single series with the same index as the original dataframe. So first, extract columns of interest. Since we don't have a named index, to ensure alignment, we don't want to drop or add any data. I use fillna just to replace the NaNs with a valid value that will be a no-op for my chosen algorithm.

This gives me a modified dataframe with 3 columns and my original index. Most pandas functions act on columns, but what we want is a sum of each row. So T transposes the dataframe so that instead of each row being a fish with 3 data samples, each column is a fish with 3 data rows. (you can also just tell it to sum over axis=1, which the description is backwards from what I understand it, but axis=1 means 'sum over columns' = per row)

Then we sum them, which will sum each column and, since that's just a single data point, also collapse them back into a single series. Which is what we want, and we can just drop it in the dataframe at that point.

Upvotes: 1

Claudiu Creanga
Claudiu Creanga

Reputation: 8366

fish_frame.dropna(axis=1, how='any') 

will remove any columns that have at least one NaN.

Upvotes: -1

Related Questions