ah bon
ah bon

Reputation: 10051

Combine two columns if one column str.contain specific characters in Pandas

Given a small test data as follows:

   id city                                           address
0   1   HK           55 Peng Sui Road Che Sham Man Hong Kong
1   2   HK        77 Kun Chok Fong San Tsuen Noi Fok Kowloon
2   3   HK         48 Nut Chok Lou Tsa Sik Kowloon Hong Kong
3   4   HK          Block 69, Hai Ang Court Aberdeen Kowloon
4   5   HK  40 Tsang Tai Kit Street Wak Luet New Territories

If Hong Kong is not contained in address, then I want to create a new column detailed_add by combining df['address'] + ', ' + df['city']:

   id city                                           address                                       detailed_add
0   1   HK           55 Peng Sui Road Che Sham Man Hong Kong            55 Peng Sui Road Che Sham Man Hong Kong
1   2   HK        77 Kun Chok Fong San Tsuen Noi Fok Kowloon     77 Kun Chok Fong San Tsuen Noi Fok Kowloon, HK
2   3   HK         48 Nut Chok Lou Tsa Sik Kowloon Hong Kong          48 Nut Chok Lou Tsa Sik Kowloon Hong Kong
3   4   HK          Block 69, Hai Ang Court Aberdeen Kowloon       Block 69, Hai Ang Court Aberdeen Kowloon, HK
4   5   HK  40 Tsang Tai Kit Street Wak Luet New Territories  40 Tsang Tai Kit Street Wak Luet New Territori..., HK

To filter rows: df[~df['address'].str.contains('Hong Kong', na = False)].

   id city                                           address
1   2   HK        77 Kun Chok Fong San Tsuen Noi Fok Kowloon
3   4   HK          Block 69, Hai Ang Court Aberdeen Kowloon
4   5   HK  40 Tsang Tai Kit Street Wak Luet New Territories

How could I do that? Thanks.

Upvotes: 0

Views: 34

Answers (3)

wwnde
wwnde

Reputation: 26676

df["detailed_add"]=np.where(~df.address.str.contains("Hong Kong"), df['address'].str.cat(df['city'], sep=","),df['address'])

df.head(2)

 id city                                     address  \
0   1   HK     55 Peng Sui Road Che Sham Man Hong Kong   
1   2   HK  77 Kun Chok Fong San Tsuen Noi Fok Kowloon   

                                    detailed_add  
0        55 Peng Sui Road Che Sham Man Hong Kong  
1  77 Kun Chok Fong San Tsuen Noi Fok Kowloon,HK  

Upvotes: 1

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

You can use np.where()

df['detailed_add'] = np.where(df['address'].str.contains('Hong Kong'), df['address'], df['address'] + ', ' + df['city'])
print(df)

   id city                                           address                                       detailed_add
0   1   HK           55 Peng Sui Road Che Sham Man Hong Kong            55 Peng Sui Road Che Sham Man Hong Kong
1   2   HK        77 Kun Chok Fong San Tsuen Noi Fok Kowloon     77 Kun Chok Fong San Tsuen Noi Fok Kowloon, HK
2   3   HK         48 Nut Chok Lou Tsa Sik Kowloon Hong Kong          48 Nut Chok Lou Tsa Sik Kowloon Hong Kong
3   4   HK          Block 69, Hai Ang Court Aberdeen Kowloon       Block 69, Hai Ang Court Aberdeen Kowloon, HK
4   5   HK  40 Tsang Tai Kit Street Wak Luet New Territories  40 Tsang Tai Kit Street Wak Luet New Territori...

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521914

You may try the following:

df["new_column"] = df["address"] if df["address"].str.contains(r'\bHong Kong\b') else df["address"] + ', HK'

The above logic just pulls over the current address if it already contains Hong Kong in it. Otherwise, it appends , HK to the end of the current address.

Upvotes: 1

Related Questions