user3841581
user3841581

Reputation: 2747

Making a pandas dataFrame based on some column values of another dataFrame

I have a pandas DataFrame df1 with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                            9
   C              12            13
   C              12             

I would like to make a DataFrame that is based on df1 but that has any row containing an empty value removed. For example:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            15
   C              12            11
   C              12            13  

I tried something like this

df1=df[~np.isnan(df["year"]) or ~np.isnan(df["current"])]

But I received the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What could be the problem?

Upvotes: 1

Views: 54

Answers (3)

Thanos
Thanos

Reputation: 2572

Please try with bitwise operator | instead, like this:

df1=df[ (~np.isnan(df["year"])) | (~np.isnan(df["current"]))]

Using dropna(), as suggested by EdChum, is likely the cleanest and neatest solution here. You can read more about this or working with missing data generally here

Upvotes: 2

EdChum
EdChum

Reputation: 394041

You can just call dropna to achieve this:

df1 = df.dropna()

As to why what you tried failed or operator doesn't understand what it should do when comparing array like structures as it is ambiguous if 1 or more elements meet the boolean criteria, you should use the bitwise operators &, | and ~ for and, or and not repsectively. Additionally for multiple conditions you need to wrap the conditions in parentheses due to operator precedence.

In [4]:
df.dropna()

Out[4]:
  Serial N  year  current
0        B    10       14
1        B    10       16
2        B    11       10
4        B    11       15
5        C    12       11
7        C    12       13

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

if you really have empty cells instead of NaN's:

In [122]: df
Out[122]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
3        B  11.0
4        B  11.0    15.0
5        C  12.0    11.0
6        C           9.0
7        C  12.0    13.0
8        C  12.0

In [123]: a.replace('', np.nan).dropna()
Out[123]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
4        B  11.0    15.0
5        C  12.0    11.0
7        C  12.0    13.0

Upvotes: 2

Related Questions