Reputation: 85
I am working with this csv https://drive.google.com/file/d/1o3Nna6CTdCRvRhszA01xB9chawhngGV7/view?usp=sharing
I am trying to sort by the 'Taxes' column, but when I use
import pandas as pd
df = pd.read_csv('statesFedTaxes.csv')
df.Taxes.values.sort_values()
I get
AttributeError: 'numpy.ndarray' object has no attribute 'sort_values'
This is baffling to me and I cannot find a similar problem online. How can I sort the data by the "Taxes" column?
EDIT: I should explain that my real problem is that when I use
df.sort_values('Taxes')
I get this output:
State Taxes
48 Washington 100,609,767
24 Minnesota 102,642,589
25 Mississippi 11,273,202
13 Idaho 11,343,181
30 New Hampshire 12,208,656
54 International 12,611,648
22 Massachusetts 120,035,203
40 Rhode Island 14,325,645
31 New Jersey 140,258,435
Therefore, I assume the commas are getting in the way of my chart sorting properly. How do I get over this?
Upvotes: 1
Views: 13385
Reputation: 3985
import pandas as pd
df = pd.DataFrame({"Taxes": ["1,000", "100", "100,000"]})
Your dataframe looks fine when we print it.
>>> df.sort_values(by="Taxes")
Taxes
0 1,000
1 100
2 100,000
But the dtype is all wrong. This is strings (stored as objects), not numbers. When you call .values
you get an array of... more strings, not numbers.
>>> df.dtypes
Taxes object
So turn them into numbers
>>> df['Taxes'] = df['Taxes'].str.replace(",", "").astype(int)
>>> df.sort_values(by="Taxes")
Taxes
1 100
0 1000
2 100000
Now it's fine.
Also an option is to just read it in with a thousands separator explicitly defined, which will fix the typing problem earlier.
df = pd.read_csv('statesFedTaxes.csv', thousands=",")
Upvotes: 3
Reputation: 420
df.Taxes
is a Series
object, and df.Taxes.values
is a ndarray
object. In this case, you're not calling sort_values
on the data frame df
- you're trying to call it on the data from the Taxes column itself.
df.sort_values('Taxes')
will give you df
sorted on that column.
Upvotes: 1
Reputation: 658
It's basically the inverted order: you want to sort the column values and then extract them to an array:
df.sort_values("Taxes")["Taxes"].values
Upvotes: 2