Sinchetru
Sinchetru

Reputation: 571

How to check entries in a column for patterns and calculate the number of patterns?

I have a DataFrame:

         Name         Price
0        Dictionary     3
1        Book           4
2        Dict En-Ru     2
3        BookforKids    6
4        Dict FR-CHN    1

I need a piece of code that will check the column 'Name' for patterns that I can specify myself and will count the number of founded patterns in another DataFrame.

For instance, check the number of entries in the 'Name' column with the patterns Dict an Book ignoring the case should give this result:

|  Pattern    | Occurencies |
| ----------- | ----------- |
| Dict        | 3           |
| Book        | 2           |

Upvotes: 0

Views: 50

Answers (2)

yatu
yatu

Reputation: 88275

Here's one way using str.extract:

patterns = ['Dict','Book']
df.Name.str.extract(rf"({'|'.join(patterns)})", expand=False).value_counts()

Dict    3
Book    2
Name: 0, dtype: int64

You can make it case insensitive with the flags argument:

patterns_lower = '|'.join([s.lower() for s in patterns])
(df.Name.str.lower().str.extract(rf"({patterns_lower})", expand=False)
        .value_counts())

Upvotes: 2

Bruno Mello
Bruno Mello

Reputation: 4618

You can define your pattern as a custom function:

# example
def get_pattern(txt):
   if 'Dict' in txt:
       return 'Dict'
   if 'Book' in txt:
       return 'Book'

   return np.nan

Then you apply in your dataframe and use value counts:

df['Pattern'] = df['Name'].apply(get_pattern)
df['Pattern'].value_counts()

Dict    3
Book    2
dtype: int64

Upvotes: 0

Related Questions