Lodore66
Lodore66

Reputation: 1185

Why is this value coming up as ambiguous?

I'm thoroughly confused as to why I'm getting a ValueError on this code; any help appreciated.

I have a dataframe called global_output with two columns: a column of words and column of corresponding values. I want to perform a median split on the values, and assign the words into two lists--high and low--depending on whether they're above or below the median.

       Word         Ranking
0      shuttle      0.9075
1      flying       0.7750
2      flight       0.7250
3      trip         0.6775
4      transport    0.6250
5      escape       0.5850
6      trajectory   0.5250
7      departure    0.5175
8      arrival      0.5175

My code for doing this is as follows:

split = global_output['Abstraction'].quantile([0.5])

high = []
low = []


for j in range(len(global_output)):
    if global_output['Ranking'][j] > split:
        low_clt.append(global_output['Word'][j])
    else:
        high.append(global_output['Word'][j])

However,I keep getting this error.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Now, I understand what the error means: it says I'm trying to evaluate a Series with multiple values as if it were a single value. Nevertheless, I just cannot see how

global_output['Ranking'][j]

is in any way ambiguous when j takes an integer value from the loop. When I input it into the console, it yields a float value every time. What am I missing here?

Upvotes: 1

Views: 84

Answers (1)

jezrael
jezrael

Reputation: 863226

You working with arrays, so better is use boolean indexing with mask and loc for selecting column:

#if need column Abstraction, change it
split = global_output['Ranking'].quantile([0.5]).item()
print (split)
0.625

mask = global_output['Ranking'] <= split
print (mask)
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
Name: Ranking, dtype: bool

high = global_output.loc[~mask, 'Word'].tolist()
low = global_output.loc[mask, 'Word'].tolist()

print (high)
['shuttle', 'flying', 'flight', 'trip']

print (low)
['transport', 'escape', 'trajectory', 'departure', 'arrival']

Your solution works also, only need convert one item Series to scalar by item() and it seems > has to be <:

split = global_output['Ranking'].quantile([0.5])
print (split)
0.5    0.625
Name: Ranking, dtype: float64

split = global_output['Ranking'].quantile([0.5]).item()
print (split)
0.625

And you get error because you compare one item Series.

Upvotes: 1

Related Questions