ACan
ACan

Reputation: 85

Can't sort dataframe column, 'numpy.ndarray' object has no attribute 'sort_values', can't separate numbers with commas

I am working with this csv https://drive.google.com/file/d/1o3Nna6CTdCRvRhszA01xB9chawhngGV7/view?usp=sharing

I am trying to sort by the 'Taxes' column, but when I use

import pandas as pd

df = pd.read_csv('statesFedTaxes.csv')
df.Taxes.values.sort_values()

I get

AttributeError: 'numpy.ndarray' object has no attribute 'sort_values'

This is baffling to me and I cannot find a similar problem online. How can I sort the data by the "Taxes" column?

EDIT: I should explain that my real problem is that when I use

df.sort_values('Taxes')

I get this output:

    State   Taxes
48  Washington  100,609,767
24  Minnesota   102,642,589
25  Mississippi 11,273,202
13  Idaho   11,343,181
30  New Hampshire   12,208,656
54  International   12,611,648
22  Massachusetts   120,035,203
40  Rhode Island    14,325,645
31  New Jersey  140,258,435

Therefore, I assume the commas are getting in the way of my chart sorting properly. How do I get over this?

Upvotes: 1

Views: 13385

Answers (3)

CJR
CJR

Reputation: 3985

import pandas as pd
df = pd.DataFrame({"Taxes": ["1,000", "100", "100,000"]})

Your dataframe looks fine when we print it.

>>> df.sort_values(by="Taxes")
     Taxes
0    1,000
1      100
2  100,000

But the dtype is all wrong. This is strings (stored as objects), not numbers. When you call .values you get an array of... more strings, not numbers.

>>> df.dtypes
Taxes    object

So turn them into numbers

>>> df['Taxes'] = df['Taxes'].str.replace(",", "").astype(int)

>>> df.sort_values(by="Taxes")
    Taxes
1     100
0    1000
2  100000

Now it's fine.

Also an option is to just read it in with a thousands separator explicitly defined, which will fix the typing problem earlier.

df = pd.read_csv('statesFedTaxes.csv', thousands=",")

Upvotes: 3

Christopher Compeau
Christopher Compeau

Reputation: 420

df.Taxes is a Series object, and df.Taxes.values is a ndarray object. In this case, you're not calling sort_values on the data frame df - you're trying to call it on the data from the Taxes column itself.

df.sort_values('Taxes') will give you df sorted on that column.

Upvotes: 1

BenB
BenB

Reputation: 658

It's basically the inverted order: you want to sort the column values and then extract them to an array:

df.sort_values("Taxes")["Taxes"].values

Upvotes: 2

Related Questions