ALT
ALT

Reputation: 25

Python matching data in two files

first of all, I am asking with the sample code in my hand under normal conditions, but I did not come up with the slightest solution for this issue.

I have two txts, the first txt:

French,01
Brasil,07
USA,23
England,34
Egypt,51
...

second txt

French
Paris
England
London
...

The first txt has more data than the second. My goal is to combine the data in the first txt according to what is in the second txt, for example: England,London,34

So far, I've tried something by converting txts to lists with the map(),reduce(), startswith(),zip() methods, but they do the matching either sequentially or randomly. how can i solve this?

list1 = ['French','01', 'Brasil','07']
list2 = ['French','Paris','England','London']

zip(list1,list2) ->> [('French','French'), ('01', Paris)]

Upvotes: 1

Views: 162

Answers (3)

Thomas
Thomas

Reputation: 10055

You could also use itertools.zip_longest and csv.reader.

import csv
from itertools import zip_longest

with open("file1.txt") as file1, open("file2.txt") as file2:

    city_records = (line for line in file2.read().splitlines())

    cities = {
        country: city
        for country, city in zip_longest(city_records, city_records)
    }
    
    result = [
        [country, cities.get(country), *other_columns]
        for country, *other_columns in csv.reader(file1)
    ]

Note: city_records must be a generator object, hence the generator comprehension.


Here is an executable example using io.StringIO to simulate the files:

import csv
from io import StringIO
from itertools import zip_longest
from pprint import pprint

content1 = """French,01
Brasil,07
USA,23
England,34
Egypt,51"""

content2 = """French
Paris
England
London"""

with StringIO(content1) as file1, StringIO(content2) as file2:
    
    city_records = (line for line in file2.read().splitlines())

    cities = {
        country: city
        for country, city in zip_longest(city_records, city_records)
    }
        
    result = [
        [country, cities.get(country), *other_columns]
        for country, *other_columns in csv.reader(file1)
    ]
    
    pprint(result)

Output:

[['French', 'Paris', '01'],
 ['Brasil', None, '07'],
 ['USA', None, '23'],
 ['England', 'London', '34'],
 ['Egypt', None, '51']]

Upvotes: 0

joanis
joanis

Reputation: 12193

In a comment that's now gone, you said the correspondence between the two lists is by line number, so I'm assuming that in this answer.

If what you're trying to do is just insert the line from text2 between the two elements on the line from text1, then the solution below will work.

I suspect your actual need may be slightly different, but this should give you what you need to solve your issue.

text1 = """French,01
Brasil,07
USA,23
England,34
Egypt,51"""

text2 = """French
Paris
England
London"""

list1 = [line.split(",") for line in text1.split("\n")]
list2 = text2.split("\n")

for line1, line2 in zip(list1, list2):
    newline = [line1[0], line2, line1[1]]
    print(",".join(newline))

It's basically just a matter of parsing and wrangling the data into the structure you need, there's not much magic to it.

Another possible answer

I just had a thought, maybe your second file is meant to have the key on one line, and the value to use on the next one, alternating?

If so, try this:

list1 = [line.split(",") for line in text1.split("\n")]
list2 = text2.split("\n")
dict2 = dict(zip(list2[::2], list2[1::2]))
for line1 in list1:
    mapped_first_field = dict2.get(line1[0], "key_not_found")
    newline = [line1[0], mapped_first_field, line1[1]]
    print(",".join(newline))

Here the key logic is that dict(zip(list2[::2], list2[1::2])) builds a dict where the keys are the first, third, etc. line in text2, and the values in each case are from the subsequent line: the slice list2[::2] picks out elements indexed 0, 2, ..., and the slice list2[1::2] picks out elements indexed 1, 3, ...

Upvotes: 0

amaranaitsaidi
amaranaitsaidi

Reputation: 101

you can combine the two like this to create one common list and then do whatever you want with it

list1 = ['French','01', 'Brasil','07']
 list2 = ['French','Paris','England','London']
 for element in list1 :
    if element not in list2:
        list2.append(element)
    else:
        print(f"{element} is already in liste 2")
        print("liste2 : ")
        print(list2)

enter image description here

Upvotes: 1

Related Questions