Rv R
Rv R

Reputation: 311

How to map values from two lists when substring matches

I have values in two different lists:

list1 = [
    "1003_0123_20200821091044_ion_fri_jl.dat",
    "8005_0086_20200821090605_ion_fri_jl.dat",
    "1003_0123_20200821091999_ion_fri_jl.dat",
]

list2 = [
    "IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI",
    "ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI",
    "CCA CTAD USD GAAPA_202103311352_20200821091044_FRI",
]

I want to pair the values that have the same substring obtained by str.split('_')[2]. For instance, the first element in list1 has the substring 20200821091044, which matches the third element in list2. Then I want to have the matched values like this:

[
    (
        "1003_0123_20200821091044_ion_fri_jl.dat",
        "CCA CTAD USD GAAPA_202103311352_20200821091044_FRI",
    ),
    (
        "8005_0086_20200821090605_ion_fri_jl.dat",
        "ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI",
    ),
    (
        "1003_0123_20200821091999_ion_fri_jl.dat",
        "IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI",
    ),
]

or in a dictionary format.

Upvotes: 2

Views: 381

Answers (2)

aneroid
aneroid

Reputation: 15962

A previous edit of your question said "or in a dictionary format" which is what I'll use here:

import collections

grouped = collections.defaultdict(list)
for item in list1+list2:  # or itertools.chain(list1, list2)
    grouped[item.split('_')[2]].append(item)

grouped is:

defaultdict(list,
            {'20200821091044': ['1003_0123_20200821091044_ion_fri_jl.dat',
              'CCA CTAD USD GAAPA_202103311352_20200821091044_FRI'],
             '20200821090605': ['8005_0086_20200821090605_ion_fri_jl.dat',
              'ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI'],
             '20200821091999': ['1003_0123_20200821091999_ion_fri_jl.dat',
              'IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI']})

Or list(grouped.values()) to get it in a list of pairs:

[['1003_0123_20200821091044_ion_fri_jl.dat',
  'CCA CTAD USD GAAPA_202103311352_20200821091044_FRI'],
 ['8005_0086_20200821090605_ion_fri_jl.dat',
  'ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI'],
 ['1003_0123_20200821091999_ion_fri_jl.dat',
  'IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI']]

Upvotes: 0

GG.
GG.

Reputation: 21854

Loop over the first list, extract the substring, loop over the second list and find the match.

results = []

for x in list1:
    substring = x.split("_")[2]

    for y in list2:
        if substring in y:
            results.append((x, y))

Upvotes: 2

Related Questions