Reputation: 7411
I'm trying to parse data from a text file. The data tuples are an age, with either 0-3 times following that are 'right' aligned. No matter how many times follow an age in the source data, I want to None
"pad" three times. Ages and times are all space separated, and further to that, times are either of the format "mm:ss.dd" or "ss.dd". The age and times can repeat one or more times in a single line.
Here is some example data:
test_str = ['25',
'24 22.10',
'16 59.35 1:02.44',
'18 52.78 59.45 1:01.22',
'33 59.35 1:02.44 34 52.78 59.45 1:01.22 24 25']
Scanned, the above should produce tuples (or list, dicts, ... whatever)
(25, None, None, None)
(24, None, None, 0:22.10)
(16, None, 0:59.35, 1:02.44)
(18, 0:52.78, 0:59.45, 1:01.22)
(33, None, 0:59.35, 1:02.44), (34, 0:52.78, 0:59.45, 1:01.22), (24, None, None, None), (25, None, None)
My thought was to use a regular expression, something along the lines of:
data_search = r'[1-9][0-9]( (([1-9][0-9]:)?[0-9]{2}.[0-9]{2})|){3}'
x = re.search(data_search, test_str[0])
But I'm not being successful.
Could somebody help me with the regex or suggest a better solution?
Upvotes: 0
Views: 62
Reputation: 11
>>> age_expr = r"(\d+)"
>>> time_expr = r"((?:\s+)(?:\d+:)?\d+\.\d+)?"
>>> expr = re.compile(age_expr + time_expr * 3)
>>> [expr.findall(s) for s in test_str]
[[('25', '', '', '')], [('24', ' 22.10', '', '')], [('16', ' 59.35', ' 1:02.44', '')], [('18', ' 52.78', ' 59.45', ' 1:01.22')], [('33', ' 59.35', ' 1:02.44', ''), ('34', ' 52.78', ' 59.45', ' 1:01.22'), ('24', '', '', ''), ('25', '', '', '')]]
Upvotes: 0
Reputation: 510
I believe this is close to what you want. Sorry for lacking regex.
def format_str(test_str):
res = []
for x in test_str:
parts = x.split(" ")
thing = []
for part in parts:
if len(thing) != 0 and '.' not in part and ':' not in part:
res.append(thing[:1] + [None]*(4-len(thing)) + thing[1:])
thing = [part]
else:
thing.append(part)
if len(thing) != 0:
res.append(thing[:1] + [None]*(4-len(thing)) + thing[1:])
return res
test_str = ['25',
'24 22.10',
'16 59.35 1:02.44',
'18 52.78 59.45 1:01.22 24 22.10']
results = format_str(test_str)
print(results)
result is:
[['25', None, None, None], ['24', None, None, '22.10'], ['16', None, '59.35', '1:02.44'], ['18', '52.78', '59.45', '1:01.22'], ['24', None, None, '22.10']]
I didn't do any formatting on the times so 52.78 isn't shown as 0:52.78 but I bet you can do that. If not, leave a comment and I'll edit a solution for that too
Upvotes: 1
Reputation: 2582
I'm not sure if this would be the best approach, but this splits off the first element as it is always statically in the first position, and then splits the rest and fills in the gaps with None
.
test_str = ['25',
'24 22.10',
'16 59.35 1:02.44',
'18 52.78 59.45 1:01.22']
def create_tuples(string_list):
all_tuples = []
for space_string in string_list:
if not space_string:
continue
split_list = space_string.split()
first_list_element = split_list[0]
last_list_elements = split_list[1:]
all_tuples.append([first_list_element] + [None] * (3 - len(last_list_elements)) + last_list_elements)
return all_tuples
print(create_tuples(test_str))
# Returns:
[['25', None, None, None], ['24', None, None, '22.10'], ['16', None, '59.35', '1:02.44'], ['18', '52.78', '59.45', '1:01.22']]
Upvotes: 1