Reputation: 175
How to extract string between date and first occurrence of digit from nested list in python 3?
nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"],
["25/12/1969","g","h","4","5"],["j","k"]]
def find_in_list_of_list(mylist, char):
for sub_list in mylist:
if char in sub_list:
return (mylist.index(sub_list), sub_list.index(char))
raise ValueError("'{char}' is not in list".format(char = char))
output = find_in_list_of_list(nested_list, "22/01/2014")
print(output,"first_date_index")
output = find_in_list_of_list(nested_list, "1")
print(output,"first_digit_index")
output = find_in_list_of_list(nested_list, "25/12/1969")
print(output,"second_date_index")
output = find_in_list_of_list(nested_list, "4")
print(output,"second_digit_index")
Expected Output:
[ ["a","b5","c","d"],["g","h"]]
Upvotes: 4
Views: 133
Reputation: 4315
The search() function takes the pattern
and text
to scan from our main string
and returns a match object when the pattern
is not found return None.
The isdigit()
function return True
if all characters in the string are digits, Otherwise, It returns False
.
import re
nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"],
["25/12/1969","g","h","4","i"],["j","k"]]
new_list =[]
for i in nested_list:
reg = False
inner_list = []
for j in i:
match = re.search(r'(\d+/\d+/\d+)',j)
if match is not None:
reg = True
continue
if reg and str.isdigit(j):
new_list.append(inner_list)
break
elif reg and not str.isdigit(j):
inner_list.append(j)
print(new_list)
O/P:
[['a', 'b5', 'c', 'd'], ['g', 'h']]
Upvotes: 1
Reputation: 11238
import re
reg = re.compile(r'\d{2}/\d{2}\d{4}')
nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"],
["25/12/1969","g","h","4","i"],["j","k"]]
for i,_list in enumerate(nested_list):
d_index=0
i_index=0
for j, _str in enumerate(_list):
if reg.findall(_str) !=[]:
d_index=j
else:
try:
if float(_str):
i_index=j
break
except ValueError:
pass
if d_index<i_index:
print(_list[d_index+1:i_index])
output
['a', 'b5', 'c', 'd']
['g', 'h']
Upvotes: 1
Reputation: 26538
Here is my take, basically tries to parse as date then float, if not then remember the string until float is met.
from datetime import datetime
nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"], ["25/12/1969","g","h","4","5"],["j","k"]]
result = []
for in_list in nested_list:
temp_holder = []
for string in in_list:
try:
datetime.strptime(string, '%d/%m/%Y')
except:
try:
float(string)
if temp_holder:
result.append(temp_holder)
break
except:
temp_holder.append(string)
print(result)
Upvotes: 1
Reputation: 88236
Here's an itertools
based approach:
from itertools import takewhile, islice
[list(takewhile(lambda x: not str.isdigit(x), islice(i,1,None))) for i in nested_list[::2]]
# [['a', 'b5', 'c', 'd'], ['g', 'h']]
takeawhile
from itertools
is useful for these cases in which we want to return values from an iterable until a condition is met, in this case that a given string is not numeric. Hence as soon as the first digit is encountered no more items are taken from the iterable.
I'm also using islice
here to take from the first item onwards in order to skip the initial date.
Upvotes: 2