Luis Valencia
Luis Valencia

Reputation: 34038

unsupported operand type(s) for +: 'float' and 'numpy.str_'

I have the following 2 lists.

['4794447', '1132804', '1392609', '9512999', '2041520', '7233323', '2853077', '4297617', '1321426', '2155664', '13310447', '6066387', '3551036', '4098927', '1865298', '20153634', '1323783', '6070500', '4661537', '2342299', '1302946', '6657982', '2807002', '3032171', '5928040', '2463431', '6131977', '778489']
[0.7142857142857143, 0.35714285714285715, 0.5138888888888888, 0.4583333333333333, 0.6, 0.5675675675675675, 0.589041095890411, 0.43478260869565216, 0.47368421052631576, 0.68, 0.622894633764199, 0.5945945945945946, 0.6338028169014085, 0.42028985507246375, 0.7464788732394366, 0.47593226788432264, 0.39436619718309857, 0.6176470588235294, 0.4142857142857143, 0.618421052631579, 0.5070422535211268, 0.625, 0.5789473684210527, 0.7012987012987013, 0.6533333333333333, 0.43661971830985913, 0.6533333333333333, 0.7222222222222222]

And I need to calculate the correlation so I did this:

  population_by_region = result['Population'].tolist()
    win_loss_by_region = result['wl_ratio'].tolist()
    corr, val = stats.pearsonr(population_by_region, win_loss_by_region)

But I get this error which is not very clear:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-37cd29ba1516> in <module>
     66 print(win_loss_by_region)
     67 #print(cities)
---> 68 corr, val = stats.pearsonr(population_by_region, win_loss_by_region)
     69 
     70 print(corr)

/opt/conda/lib/python3.7/site-packages/scipy/stats/stats.py in pearsonr(x, y)
   3403     # that the data type is at least 64 bit floating point.  It might have
   3404     # more precision if the input is, for example, np.longdouble.
-> 3405     dtype = type(1.0 + x[0] + y[0])
   3406 
   3407     if n == 2:

TypeError: unsupported operand type(s) for +: 'float' and 'numpy.str_'

Both lists are the same lenght!

Upvotes: 0

Views: 316

Answers (1)

jezrael
jezrael

Reputation: 863641

I think need both numeric, so use:

population_by_region = result['Population'].astype(int).tolist()

Also converting to list is not necessary, pass both columns like:

corr, val = stats.pearsonr(result['Population'].astype(int), result['wl_ratio'])
print (corr, val)
-0.04027318804589655 0.8387661496942489

Upvotes: 3

Related Questions