Reputation: 301
I have a large DataFrame (over a million columns) and I would like to keep only those columns whose values fall between two numbers, but the specified range is different for each row. I was thinking of defining two separate Series (an upper limit and a lower limit) to use for the comparison, but I don't know the most efficient way to do this. For example, if a
is a single column from my large DataFrame, I only want to keep it if the value in each row falls between a_high
and a_low
. Below, a
would meet the criteria for success.
a = pd.Series([1,4,5,2,3,3,5,7])
a_high = pd.Series([2,4,6,2,4,4,6,8])
a_low = pd.Series([0,2,4,0,2,2,4,6])
Because my DataFrame is so big, I'm trying to avoid looping through each row. Do you have any suggestion? I was wondering if df.apply()
or list comprehension may help here. Thanks!
-Zack
Upvotes: 0
Views: 293