Reputation: 1185
I'm thoroughly confused as to why I'm getting a ValueError on this code; any help appreciated.
I have a dataframe called global_output with two columns: a column of words and column of corresponding values. I want to perform a median split on the values, and assign the words into two lists--high and low--depending on whether they're above or below the median.
Word Ranking
0 shuttle 0.9075
1 flying 0.7750
2 flight 0.7250
3 trip 0.6775
4 transport 0.6250
5 escape 0.5850
6 trajectory 0.5250
7 departure 0.5175
8 arrival 0.5175
My code for doing this is as follows:
split = global_output['Abstraction'].quantile([0.5])
high = []
low = []
for j in range(len(global_output)):
if global_output['Ranking'][j] > split:
low_clt.append(global_output['Word'][j])
else:
high.append(global_output['Word'][j])
However,I keep getting this error.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Now, I understand what the error means: it says I'm trying to evaluate a Series with multiple values as if it were a single value. Nevertheless, I just cannot see how
global_output['Ranking'][j]
is in any way ambiguous when j takes an integer value from the loop. When I input it into the console, it yields a float value every time. What am I missing here?
Upvotes: 1
Views: 84
Reputation: 863226
You working with arrays
, so better is use boolean indexing
with mask
and loc
for selecting column:
#if need column Abstraction, change it
split = global_output['Ranking'].quantile([0.5]).item()
print (split)
0.625
mask = global_output['Ranking'] <= split
print (mask)
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 True
8 True
Name: Ranking, dtype: bool
high = global_output.loc[~mask, 'Word'].tolist()
low = global_output.loc[mask, 'Word'].tolist()
print (high)
['shuttle', 'flying', 'flight', 'trip']
print (low)
['transport', 'escape', 'trajectory', 'departure', 'arrival']
Your solution works also, only need convert one item Series
to scalar
by item()
and it seems >
has to be <
:
split = global_output['Ranking'].quantile([0.5])
print (split)
0.5 0.625
Name: Ranking, dtype: float64
split = global_output['Ranking'].quantile([0.5]).item()
print (split)
0.625
And you get error
because you compare one item Series
.
Upvotes: 1