Karl Baker
Karl Baker

Reputation: 913

pandas Selecting single value from df using .loc() is producing a df instead of a numeric

I have two dataframes, sarc and non. After running describe() on both I want to compare the mean value for a particular column in both dataframes. I used .loc() and tried saving the value as a float but it is saving as a dataframe, which prevents me from comparing the two values using the > operator. Here's my code:

sarc.describe()
        label        c_len    c_s_l_len        score
count  5092.0  5092.000000  5092.000000  5092.000000
mean      1.0    54.876277    33.123527     6.919874
std       0.0    37.536986    22.566558    43.616977
min       1.0     0.000000     0.000000   -96.000000
25%       1.0    29.000000    18.000000     1.000000
50%       1.0    47.000000    28.000000     2.000000
75%       1.0    71.000000    43.000000     5.000000
max       1.0   466.000000   307.000000  2381.000000

non.describe()
        label        c_len    c_s_l_len        score
count  4960.0  4960.000000  4960.000000  4960.000000
mean      0.0    55.044153    33.100806     6.912298
std       0.0    47.873732    28.738776    39.216049
min       0.0     0.000000     0.000000  -119.000000
25%       0.0    23.000000    14.000000     1.000000
50%       0.0    43.000000    26.000000     2.000000
75%       0.0    74.000000    44.000000     4.000000
max       0.0   594.000000   363.000000  1534.000000

non_c_len_mean = non.describe().loc[['mean'], ['c_len']].astype(np.float64) 
sarc_c_len_mean = sarc.describe().loc[['mean'], ['c_len']].astype(np.float64)

if sarc_c_len_mean > non_c_len_mean:
    # do stuff

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The variables are indeed of <class 'pandas.core.frame.DataFrame'> type, and each prints as a labeled 1-row, 1-col df instead of just the value. How can I select only the numeric value as a float?

Upvotes: 0

Views: 41

Answers (1)

BENY
BENY

Reputation: 323236

Remove the [] in .loc when you pick the columns and index

non.describe().loc['mean', 'c_len']

Upvotes: 1

Related Questions