Rock1432
Rock1432

Reputation: 209

Removing rows from dataframe which don't contain a string of a certain length

I have a dataframe which contains a column with strings of the form XXX/XX/XXX. I want to remove all rows for which the length of the string between the '/'s is not equal to two.

I'm getting a "key error: True" with the following code:

df_issues = df_new[len(df_new['Job'].str.split('/')[1]) != 2 ]

My approach was to create a series with all rows for which the string length after the first '/' was not equal to 2.

Thanks for any help.

Upvotes: 2

Views: 763

Answers (1)

yatu
yatu

Reputation: 88236

Some things you have wrong here:

  • len(x) != 2 will return a boolean. i.e. you're trying to index with df_new[True], which returns a key error, since the shapes are not compatible (you want an indexing array along the rows, something like df_new[[True, False, True...]])
  • You need the str accessor again to further index on the second list

Use instead:

df_new[df_new['Job'].str.split(r'/').str[1].str.len().eq(2.)]

Or we could also use str.contains:

# corrected with @jon's remarks
df_new[df_new['Job'].str.contains(r'^.{3}/.{2}/.{3}$',na=False)] 

Upvotes: 3

Related Questions