suba
suba

Reputation: 175

How to extract string between date and first occurrence of digit from nested list in python 3?

How to extract string between date and first occurrence of digit from nested list in python 3?

nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"], 
                                   ["25/12/1969","g","h","4","5"],["j","k"]]


def find_in_list_of_list(mylist, char):
  for sub_list in mylist:
    if char in sub_list:
        return (mylist.index(sub_list), sub_list.index(char))
 raise ValueError("'{char}' is not in list".format(char = char))

output = find_in_list_of_list(nested_list, "22/01/2014")
print(output,"first_date_index")
output = find_in_list_of_list(nested_list, "1")
print(output,"first_digit_index")

output = find_in_list_of_list(nested_list, "25/12/1969")
print(output,"second_date_index")
output = find_in_list_of_list(nested_list, "4")
print(output,"second_digit_index")

Expected Output:
[ ["a","b5","c","d"],["g","h"]]

Upvotes: 4

Views: 133

Answers (4)

bharatk
bharatk

Reputation: 4315

The search() function takes the pattern and text to scan from our main string and returns a match object when the pattern is not found return None.

The isdigit() function return True if all characters in the string are digits, Otherwise, It returns False.

import  re
nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"],
                                   ["25/12/1969","g","h","4","i"],["j","k"]]

new_list =[]
for i in nested_list:
    reg = False
    inner_list = []
    for j in i:
        match = re.search(r'(\d+/\d+/\d+)',j)
        if match is not None:
            reg = True
            continue

        if reg and str.isdigit(j):
            new_list.append(inner_list)
            break
        elif reg and not str.isdigit(j):
            inner_list.append(j)

print(new_list)

O/P:

[['a', 'b5', 'c', 'd'], ['g', 'h']]

Upvotes: 1

sahasrara62
sahasrara62

Reputation: 11238

import re

reg = re.compile(r'\d{2}/\d{2}\d{4}')

nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"], 
                                   ["25/12/1969","g","h","4","i"],["j","k"]]


for i,_list in enumerate(nested_list):
    d_index=0
    i_index=0
    for j, _str in enumerate(_list):
        if reg.findall(_str) !=[]:
            d_index=j
        else:
            try:
                if float(_str):
                    i_index=j
                    break
            except ValueError:
                pass
    if d_index<i_index:
        print(_list[d_index+1:i_index])

output

['a', 'b5', 'c', 'd']
['g', 'h']

Upvotes: 1

James Lin
James Lin

Reputation: 26538

Here is my take, basically tries to parse as date then float, if not then remember the string until float is met.

from datetime import datetime
nested_list = [["22/01/2014","a","b5","c","d","1","2.5","3.3"],["e","f"], ["25/12/1969","g","h","4","5"],["j","k"]]

result = []
for in_list in nested_list:
    temp_holder = []
    for string in in_list:
        try:
            datetime.strptime(string, '%d/%m/%Y')
        except:
            try:
                float(string)
                if temp_holder:
                    result.append(temp_holder)
                break
            except:
                temp_holder.append(string)

print(result)

Upvotes: 1

yatu
yatu

Reputation: 88236

Here's an itertools based approach:

from itertools import takewhile, islice

[list(takewhile(lambda x: not str.isdigit(x), islice(i,1,None))) for i in nested_list[::2]]
# [['a', 'b5', 'c', 'd'], ['g', 'h']]

takeawhile from itertools is useful for these cases in which we want to return values from an iterable until a condition is met, in this case that a given string is not numeric. Hence as soon as the first digit is encountered no more items are taken from the iterable.

I'm also using islice here to take from the first item onwards in order to skip the initial date.

Upvotes: 2

Related Questions