aroma
aroma

Reputation: 1421

Creating a DataFrame in Pandas using logical and operator

I have a pandas Datarame with some columns and I first wanted to print only those rows whose values in a particular column is less than a certain value. So I did:

df[df.marks < 4.5]

It successfully created the dataframe, now I want to add only those columns whose values are in a certain range, so I tried this:

df[(df.marks < 4.5 and df.marks > 4)]

but it's giving me an error:

712         raise ValueError("The truth value of a {0} is ambiguous. "    
713                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 714                          .format(self.__class__.__name__))
715 
716     __bool__ = __nonzero__    
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().

How do I resolve this? And initially I also thought that it would iterate through all the rows and check the truth value and then add the row in the dataframe, but it seems like this isn't the case, if so how does it add the row in the dataframe?

Upvotes: 2

Views: 5527

Answers (3)

jezrael
jezrael

Reputation: 862771

First, Evert's solution is nice.

I add another 2 possible solutions:

... with between:

df = pd.DataFrame({'marks':[4.2,4,4.4,3,4.5]})
print (df)
   marks
0    4.2
1    4.0
2    4.4
3    3.0
4    4.5

df = df[df.marks.between(4,4.5, inclusive=False)]
print (df)
   marks
0    4.2
2    4.4

... with query:

df = df.query("marks < 4.5 & marks > 4")
print (df)
   marks
0    4.2
2    4.4

Upvotes: 1

Xingzhou Liu
Xingzhou Liu

Reputation: 1559

I've run into that problem before. not 100% on cause but dataframe object does not like multiple conditions together.

  df[(df.marks < 4.5 and df.marks > 4)] -> will fail

Doing something like this usually will solve the problem.

  df[(df.marks < 4.5)] [(df.marks > 4)] 

That project is not at the top of my head right now, but i think quoting them separately works too.

Upvotes: 1

user707650
user707650

Reputation:

Use

df[(df.marks < 4.5) & (df.marks > 4)]

Slightly more generally, array logical operations are combined using parentheses around the individual conditions:

(a < b) & (c > d)

Similar for OR-combinations, or more than 2 conditions.

This is how it's set up in NumPy, with boolean operators on arrays, and Pandas has copied that behaviour.

Upvotes: 4

Related Questions