PySpark: create new column based on dictionary values matching with string in another column

Question

I have a dataframe A that looks like this:

ID	SOME_CODE	TITLE
1	024df3	Large garden in New York, New York
2	0ffw34	Small house in dark Detroit, Michigan
3	93na09	Red carpet in beautiful Miami
4	8339ct	Skyscraper in Los Angeles, California
5	84p3k9	Big shop in northern Boston, Massachusetts

I have also another dataframe B:

City	Shortcut
Los Angeles	LA
New York	NYC
Miami	MI
Boston	BO
Detroit	DTW

I would like to add new "SHORTCUT" column to dataframe A, based on the fact that "Title" column in A contains city from column "City" in dataframe B. I have tried to use dataframe B as dictionary and map it to dataframe A, but I can't overcome fact that city names are in the middle of the sentence.

The desired output is:

ID	SOME_CODE	TITLE	SHORTCUT
1	024df3	Large garden in New York, New York	NYC
2	0ffw34	Small house in dark Detroit, Michigan	DTW
3	93na09	Red carpet in beautiful Miami, Florida	MI
4	8339ct	Skyscraper in Los Angeles, California	LA
5	84p3k9	Big shop in northern Boston, Massachusetts	BO

I will appreciate your help.

PySpark: create new column based on dictionary values matching with string in another column

Answers (1)

Related Questions