J.Todd
J.Todd

Reputation: 827

How can I match a regex pattern and return captured groups without including empty strings for the groups that aren't captured?

As an assignment I was faced with the question:

Given an input string similar to the below, craft a regular expression pattern to match and extract the date, time, and temperature in groups and return this pattern. Samples given below.

Date: 12/31/1999 Time: 11:59 p.m. Temperature: 44 F
Date: 01/01/2000 Time: 12:01 a.m. Temperature: 5.2 C

So I opened regex101 and created this pattern which tests correctly:

def q6(strng):
    import re
    pattern = '((?<=Date: )\d{1,2}\/\d{1,2}\/\d{4})|((?<=Time: )?\d{1,2}:\d{1,2} ?[pPaA].?[mM].?)|((?<=Temperature: )\d{1,3}.?\d{1,3} ?[CF])'
    print(re.findall(pattern, strng))
    return pattern

q6("Date: 12/31/1999 Time: 11:59 p.m. Temperature: 44 F")
q6("Date: 01/01/2000 Time: 12:01 a.m. Temperature: 5.2 C")

but in python the pattern seems to give a flawed answer:

[('12/31/1999', '', ''), ('', '11:59 p.m.', ''), ('', '', '44 F')]
[('01/01/2000', '', ''), ('', '12:01 a.m.', ''), ('', '', '5.2 C')]

You can see the extra empty items in the tuples returned. This question will be graded via program and if you notice the question asks for the pattern to be returned, not the result, therefore no trimming is possible.

Am I just using the wrong regex match function or what have I done wrong?

Upvotes: 1

Views: 870

Answers (2)

hc_dev
hc_dev

Reputation: 9377

TL;DR: You named a solution - quoting from last sentence of your question: match function 😉️

The tuples printed seem to be correct findings

From the docs re.findall(pattern, string, flags=0):

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

I highlighted the portions that fit your case in bold.

Analyze the requirements or acceptance criteria

Simply split your task (text) by and to get these 3 requirements:

  1. craft a regular expression pattern to match
  2. extract the date, time, and temperature in groups
  3. return this pattern

Broken-down into sub-tasks, the task or problem becomes easily solvable. Plus, as a result these steps will guide you to the solution.

This problem-solving strategy is known as divide and conquer.

Clues towards a solution

Now try to solve step by step, starting with (1), then (2), finally (3).

  1. (a) craft a regex string (r'') and (b) compile to a pattern to (c) match
  2. (if matches then) groups (all 3 parts put inside parentheses) can be extracted (all at once, but only if given string matches the pattern)
  3. (clarify) what exactly to return (which type of object is expected: a pattern or the regex as string)

Sorry, that I haven't presented you the perfect solution. But you are very close. As far as I can see, those clues given will get you there.

I gave you a step-wise recipe plus keywords which you can use to search on Stackoverflow:

[python] regex extract groups

They are all in your given task:

Given an input string similar to the below, craft a regular expression pattern to match and extract the date, time, and temperature in groups and return this pattern. Samples given below.

From my experience in crafting

Analyzing the problem, identifying keywords, clarifying broad/vague specifications so that you are able to research and collect ingredients form 80% of designing software. Whereas cooking and coding fill up the remaining 20%.

Upvotes: 1

tomo_iris427
tomo_iris427

Reputation: 158

You should add ?: to parentheses which you don't want capture: (?:.....)|(?:....)|(?:...)

Upvotes: 2

Related Questions