user6759523
user6759523

Reputation:

using len() in Pandas dataframe

This is the look of my DataFrame:

   StateAb    GivenNm    Surname                  PartyNm PartyAb  ElectedOrder
35      WA        Joe    BULLOCK   Australian Labor Party     ALP             2
36      WA  Michaelia       CASH                  Liberal      LP             3
37      WA      Linda   REYNOLDS                  Liberal      LP             4
38      WA      Wayne  DROPULICH  Australian Sports Party    SPRT             5
39      WA      Scott     LUDLAM          The Greens (WA)     GRN             6

and I want to list a list of senators whose surname is more than 9 characters long.

So I think the code should be like this:

df[len(df.Surname) > 9]

but this raises a KeyError, where did I go wrong?

Upvotes: 10

Views: 52587

Answers (2)

user2285236
user2285236

Reputation:

The correct way to filter a DataFrame based on the length of strings in a column is

df[df['Surname'].str.len() > 9]

df['Surname'].str.len() creates a Series of lengths for the surname column and df[df['Surname'].str.len() > 9] filters out the ones less than or equal to 9. What you did is to check the length of the Series itself (how many rows it has).

Upvotes: 22

Sytse Reitsma
Sytse Reitsma

Reputation: 11

Have a look at the python filter function. It does exactly what you want.

df = [
    {"Surname": "Bullock-ish"},
    {"Surname": "Cash"},
    {"Surname": "Reynolds"},
]
longnames = list(filter(lambda s: len(s["Surname"]) > 9, df))
print(longnames)

>>[{'Surname': 'Bullock-ish'}]

Sytse

Upvotes: 1

Related Questions