kirti purohit
kirti purohit

Reputation: 431

Regex Pattern Matching -a substring in words in CSV File

'Neighborhood,eattend10,eattend11,eattend12,eattend13,mattend10,mattend11,mattend12,mattend13,
hsattend10,hsattend11,hsattend12,hsattend13,eenrol11,eenrol12,eenrol13,menrol11,menrol12,
menrol13,hsenrol11,hsenrol12,hsenrol13,aastud10,aastud11,aastud12,aastud13,wstud10,wstud11,
wstud12,wstud13,hstud10,hstud11,hstud12,hstud13,abse10,abse11,abse12,abse13,absmd10,absmd11,
absmd12,absmd13,abshs10,abshs11,abshs12,abshs13,susp10,susp11,susp12,susp13,farms10,farms11,
farms12,farms13,sped10,sped11,sped12,sped13,ready11,ready12,ready13,math310,math311,math312,
math313,read310,read311,read312,read313,math510,math511,math512,math513,read510,read511,read512,
read513,math810,math811,math812,math813,read810,read811,read812,read813,hsaeng10,hsaeng11,
hsaeng12,hsaeng13,hsabio10,hsabio11,hsabio12,hsabio13,hsagov10,hsagov11,hsagov13,hsaalg10,
hsaalg11,hsaalg12,hsaalg13,drop10,drop11,drop12,drop13,compl10,compl11,compl12,compl13,
sclsw11,sclsw12,sclsw13,sclemp13\

I have this data set. I need to know how many drop words are there and print them.

Or similarly for any word like mattend and print those.

I tried using findall but I think that's not correct

I assume we can use re.search or re.match. How can I do it in RegEx?

Upvotes: 0

Views: 798

Answers (2)

Arzybek
Arzybek

Reputation: 802

I think re.findall should be correct. From python re module documentation:

Search:

Scan through string looking for the first location where this regular expression produces a match, and return a corresponding match object.

Match:

If zero or more characters at the beginning of string match this regular expression, return a corresponding match object.

Findall:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

I tried it on your example and it worked for me: re.findall("drop", str)

If you want to see digits after it you can try something like: re.findall("drop\d*", str)

If you want to count the words you can use: len(re.findall("drop\d*", str))

Upvotes: 1

wasif
wasif

Reputation: 15488

You can use len() on re.findall() to get the length of the returned list:

import re
with open('example.csv') as f:
  data = f.read().strip()
print(len(re.findall('drop',data)))

Upvotes: 1

Related Questions