Reputation: 913
I have two dataframes, sarc and non. After running describe()
on both I want to compare the mean value for a particular column in both dataframes. I used .loc()
and tried saving the value as a float but it is saving as a dataframe, which prevents me from comparing the two values using the >
operator. Here's my code:
sarc.describe()
label c_len c_s_l_len score
count 5092.0 5092.000000 5092.000000 5092.000000
mean 1.0 54.876277 33.123527 6.919874
std 0.0 37.536986 22.566558 43.616977
min 1.0 0.000000 0.000000 -96.000000
25% 1.0 29.000000 18.000000 1.000000
50% 1.0 47.000000 28.000000 2.000000
75% 1.0 71.000000 43.000000 5.000000
max 1.0 466.000000 307.000000 2381.000000
non.describe()
label c_len c_s_l_len score
count 4960.0 4960.000000 4960.000000 4960.000000
mean 0.0 55.044153 33.100806 6.912298
std 0.0 47.873732 28.738776 39.216049
min 0.0 0.000000 0.000000 -119.000000
25% 0.0 23.000000 14.000000 1.000000
50% 0.0 43.000000 26.000000 2.000000
75% 0.0 74.000000 44.000000 4.000000
max 0.0 594.000000 363.000000 1534.000000
non_c_len_mean = non.describe().loc[['mean'], ['c_len']].astype(np.float64)
sarc_c_len_mean = sarc.describe().loc[['mean'], ['c_len']].astype(np.float64)
if sarc_c_len_mean > non_c_len_mean:
# do stuff
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The variables are indeed of <class 'pandas.core.frame.DataFrame'>
type, and each prints as a labeled 1-row, 1-col df instead of just the value. How can I select only the numeric value as a float?
Upvotes: 0
Views: 41
Reputation: 323236
Remove the []
in .loc
when you pick the columns
and index
non.describe().loc['mean', 'c_len']
Upvotes: 1