raby1996
raby1996

Reputation: 43

Using python to find repeated names in a txt file

I want to use python ( although any language is fine ), to look through a structured text file which looks like this:

========= Weekend of 2016-12-02: ================
Schedule1:
bob@email
Schedule2:
john@email
bob@email
Schedule3:
Terry@email
========= Weekend of 2016-12-09: ================
Schedule1:
jake@email
Schedule2:
mike@email
bob@email
Schedule3:
howard@email

This pattern repeats for the remainder of the year, what I am trying to accomplish is to find any overlapping schedules. So if bob@email is on more than one schedule for that weekend I would like to find and print that. Example:

Overlaps found for:
========= Weekend of 2016-12-02: ================
bob@email is scheduled for schedule1, and schedule2.

Since this is the only overlap, this is the only occurrence that would print, if there were more then they would print in the same format underneath each other . Is there any way to accomplish this?

The code I've found so far allows me to find each weekend and print that, however I'm not sure how to look at the contents in more detail.

import re
    def compare():
         with open("weekends.txt","r") as fp:
             for result in re.findall('Weekend of (.*?):', fp.read(), re.S):
                 print(result)

This yields

2016-12-02
2016-12-09

Thank you, and please let me know if there are any questions.

Upvotes: 2

Views: 103

Answers (2)

dawg
dawg

Reputation: 103864

You can do something like this with a regex creating a dict of sets:

import re
from collections import Counter

data={}

with open(fn) as f_in:
    txt=f_in.read()

for block in re.finditer(r'^=+\s+([^:]+:)\s=+\s+([^=]+)', txt, re.M):
    di={}
    for sc in re.finditer(r'^(Schedule\s*\d+):\s*([\s\S]+?)(?=(?:^Schedule\s*\d+)|\Z)', block.group(2), re.M):
        di[sc.group(1)]=set(sc.group(2).splitlines())
    data[block.group(1)]=di

for date, DofS in data.items():
    c=Counter()
    for s in DofS.values():
        c+=Counter(s)
    inverted={k:[] for k, v in c.items() if v>1} 
    if not inverted:
        continue
    print date  
    for k in DofS:
        for e in DofS[k]:
            if e in inverted:
                inverted[e].append(k)    
    print "\t",inverted     

Prints:

Weekend of 2016-12-02:
    {'bob@email': ['Schedule1', 'Schedule2']}

Upvotes: 1

Mingwei Lin
Mingwei Lin

Reputation: 150

I think you can use a map to store <name, list of schedule>, like <bob@email, [Schedule1]>, when you go through each weekend. Everytime, you want to add a new item, you can check whether the key has been set already. If yes, add that schedule to that corresponding list. If no, add a new item to that map. Then, when you print out, only print the item with more than 1 schedule in the list.

For Python, you can use dictionary as the map. https://www.tutorialspoint.com/python/python_dictionary.htm

Upvotes: 0

Related Questions