How to replace one column variables with variables in another table (using regex) python, is it even possible?

Question

I have two datasets. First dataset includes all raw values that must be replaced with acceptable values that are given in the second dataset. If matching acceptable value is not found in second dataset, then leave it its own way.

First looks like this:

SOURCE_ID	TITLE
1	Emaar Beachfront
2	EmaarBeachfront
3	emaar beachfront
4	dubai hills estate
5	Dubai Hills
6	Nad Al Sheba
7	Nadalsheba
8	dubai hills residences
9	The Cove Ru
10	Homes

Second looks like this:

ID	TITLE
1	Emaar Beachfront
2	Dubai Hills
3	Nad Al Sheba
4	The Cove

So that in the end my dataset looks like this:

SOURCE_ID	TITLE
1	Emaar Beachfront
2	Emaar Beachfront
3	Emaar Beachfront
4	Dubai Hills
5	Dubai Hills
6	Nad Al Sheba
7	Nad Al Sheba
8	Dubai Hills
9	The Cove
10	Homes

I thought it is possible via regex, but i am not sure

user11718531 · Accepted Answer

One solution could be this:

first = ["Emaar Beachfront",
"EmaarBeachfront",
"emaar beachfront",
"dubai hills estate",
"Dubai Hills",
"Nad Al Sheba",
"Nadalsheba",
"dubai hills residences",
"The Cove Ru",
"Homes"]

second = [
"Emaar Beachfront",
"Dubai Hills",
"Nad Al Sheba",
"The Cove"
]

second_transformed = [item.replace(" ", "").lower() for item in second]

out = []

for item in first:
    item_transformed = item.replace(" ", "").lower()
    item_found = False
    for second_item, second_item_transformed in zip(second, second_transformed):
        if second_item_transformed in item_transformed:
            out.append(second_item)
            item_found = True
            break
    if not item_found:
        out.append(item)

print(out)

How to replace one column variables with variables in another table (using regex) python, is it even possible?

Answers (1)

Related Questions