Reputation: 27
I have a two lists and I want to match my first list that contains a list of regex patterns with my list of values. In addition, count how many times the values match with the regex. Finally I want to send those statistics to a new dataframe.
Here is a breakdown:
List 1:
regex_list = ['Error: Look ','Parking Charge Notice', '^Follow Up$']
List 2:
value_list = ['Follow Up','abc123','abc123', 'Error: Look', 'Follow Up']
I want the new dataframe's output to look like:
pattern, count
'Error: Look', 1
'^Follow Up$', 2
'Parking Charge Notice': 0
As you can see, my new dataframe displays the value that matched from list 1 and how many times it matched in list 2.
Here is my python so far:
import re
regex_list = ['Error: Look ', 'Parking Charge Notice', '^Follow Up$']
value_list = ['Follow Up', 'abc123', 'abc123', 'Error: Look', 'Follow Up']
p = re.compile(r'^Follow Up$')
matches = p.findall(value_list)
Here is my output:
Traceback (most recent call last):
File "C:/Users/e136320/PycharmProjects/scrape_imsva_v2/working/regex_test.py", line 35, in <module>
matches = p.findall(value_list)
TypeError: expected string or bytes-like object
I receive an error shown above. Is there a way to automatically loop through my regex list and filter out my value_list for instances and then put the patter and its count in a dataframe?
I know my code isn't much but I am new to python and dataframes so I am completely lost so any ideas or suggestions would help.
Upvotes: 0
Views: 221
Reputation: 632
You can try following code:
import re
import pandas as pd
regex_list = ['Error: Look', 'Parking Charge Notice', '^Follow Up$']
value_list = ['Follow Up', 'abc123', 'abc123', 'Error: Look', 'Follow Up']
df = pd.DataFrame()
for j in regex_list:
p = re.compile(j)
for i in value_list:
matches = p.findall(i)
if len(matches)!=0:
df = df.append({'regex':j,'matched':matches},ignore_index = True)
print(df)
count=df.groupby('regex')['matched'].count().reset_index()
count.columns = ['regex','count']
print(count)
Based on the error message that you posted you are passing a list of values to findall which is causing the issue.
Upvotes: 1