Maria628
Maria628

Reputation: 234

Python: filter lines based on field match from another file

I have generated a list with my other python code, which looks like this. there are lines separated by commas and they are in single quotes. I am trying hard to filter the lines based on D: column match from another file, which has only starting number characters.

data = ['A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT', 'A:SET, B:IT, C:AS, D:+22211111, E:+12355, F:ROOT', 'A:SET, B:FW.O, C:AS, D:+177232, E:+12355', 'A:SET, B:IT, C:AS, D:+368399793, E:+12355']

it looks likes this line by line in single quotes.

[
'A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT', 
'A:SET, B:IT, C:AS, D:+22211111, E:+12355, F:ROOT', 
'A:SET, B:FW.O, C:AS, D:+177232, E:+12355', 
'A:SET, B:IT, C:AS, D:+368399793, E:+12355'
]

I have another file which has filtering numbers, to be matched in above lists/

cat fields.txt
+36
+18
#these are country prefixes

I need to match above lists D: column to "fields.txt" file starting numbers and print only those lines. Since "data" D: col numbers vary every time, I need to filter based on their country prefix.

output expected:

[
'A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT', ###matched as starting num +18 in D: col
'A:SET, B:IT, C:AS, D:+368399793, E:+12355'  ###matched as starting num +36 in D: col
]

I have already tried various examples to write a "FOR" loop and match the nums but no luck.

please help me. I am new to Python programming.

Upvotes: 1

Views: 590

Answers (3)

exhuma
exhuma

Reputation: 21757

You can do this with a list-comprehension with an included if condition. This has the benefit that your logic which decides which line to include or exclude can be nicely tucked away in a separate function (matches in the example below).

Having a separate function makes this very testable, you can add a docstring and it makes it much more maintainable.

data = [
    "A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT",
    "A:SET, B:IT, C:AS, D:+22211111, E:+12355, F:ROOT",
    "A:SET, B:FW.O, C:AS, D:+177232, E:+12355",
    "A:SET, B:IT, C:AS, D:+368399793, E:+12355",
]


def load_codes():
    with open("fields.txt") as fieldfile:
        codes = fieldfile.read().splitlines()
    return codes


def matches(row, codes):
    for code in codes:
        if "D:%s" % code in row:
            return True
    return False


def main():
    codes = load_codes()
    filtered = [row for row in data if matches(row, codes)]

    for row in filtered:
        print(row)


if __name__ == "__main__":
    main()

Upvotes: 2

dennohpeter
dennohpeter

Reputation: 461

I don't think there is need to split each item in the data list You can simply do

data = [
'A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT', 
'A:SET, B:IT, C:AS, D:+22211111, E:+12355, F:ROOT', 
'A:SET, B:FW.O, C:AS, D:+177232, E:+12355', 
'A:SET, B:IT, C:AS, D:+368399793, E:+12355'
]

with open("fields.txt") as f:
    codes = f.read().splitlines()

required = []
for item in data:
    for code in codes:
        if "D:%s" %code in item:
            required.append(item)
print(required)

You will end up with

[
'A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT',
'A:SET, B:IT, C:AS, D:+368399793, E:+12355'
]

Upvotes: 1

Farzad Vertigo
Farzad Vertigo

Reputation: 2838

I think this solution suits your need:

with open("fields.txt") as f:
    codes = f.read().splitlines()

data = ['A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT', \
        'A:SET, B:IT, C:AS, D:+22211111, E:+12355, F:ROOT', \
        'A:SET, B:FW.O, C:AS, D:+177232, E:+12355', \
        'A:SET, B:IT, C:AS, D:+368399793, E:+12355']

for index, item in enumerate(data):
    sub_items =item.replace(" ", "").split(",")  # to remove spaces and get each individual item
    for sub_item in sub_items: # you can replace this for loop with sub_items[3] if the position of D: is fixed
        if(sub_item.startswith("D:")):
            value = sub_item.replace("D:", "")  # here you have +xxxx in the data point
            # you can apply the logic here:
            for code in codes:
                if value.startswith(code):
                    print(code, value, index, data[index])

It prints the following lines if fields.txt contains the numbers you mentioned in the question:

+18 +18700000 0 A:SET, B:FW.O, C:AS, D:+18700000, E:+12355, F:ROOT
+36 +368399793 3 A:SET, B:IT, C:AS, D:+368399793, E:+12355

Upvotes: 1

Related Questions