MAK7
MAK7

Reputation: 125

Find particular format in Pandas Dataframe Cell

I have a dataframe in which I need to check if the content in each cell of a column follows a particular format.

Index    Column A
0        abcd
1        abc_1
2        abc_xy
3        abc_12
4        zabc_12

How can I go about finding cells that match the format: 'abc_ + number' such that the values in index 1 and 3 would be found.
So far I know how I can look for the 'abc_' or numeric part of the cell using regex:

re.match('abc_', df['Column A'])

But I am not sure how to look for the complete pattern. Any help will be appreciated, thank you!

Upvotes: 2

Views: 2500

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627110

You may use Series.str.contains:

df['Column A'].str.contains(r'^abc_\d')

Or, if that pattern should match the whole string

df['Column A'].str.contains(r'^abc_\d+$')

Note that by default, the pat argument is treated as a regex, so you do not have to use regex=True. You may use a na argument to define a fill value for missing values.

Pattern details

  • ^ - start of string (you need it here as str.contains uses re.search, not re.match and thus does not anchor the match at the start of the string)
  • abc_ - a literal substring
  • \d+ - 1+ digits
  • $ - end of string.

Upvotes: 5

Related Questions