Petr Petrov
Petr Petrov

Reputation: 4452

Pandas: replace values in columns using regex

I have 2 dataframe and I need to get new column to first dataframe, using values from second FIrse df is

ID,"url","used_at","active_seconds"
8075643aab791cec7dc9d18926958b67,"sberbank.ru/ru/person/promo/10mnl?utm_source=Vesti.ru&utm_medium=html&utm_campaign=10_million_users_SBOL_dec2015&utm_term=every14_syncbanners",2016-01-01 00:03:16,183
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/system/login/rslogin.jsp?credit=false",2016-01-01 00:04:36,42
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/system/login/sms/sms.jsp?smsAuth=true",2016-01-01 00:05:18,22
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/rs/RSIndex.jspx",2016-01-01 00:05:40,14
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/rs/payments/PaymentReq.jspx",2016-01-01 00:05:54,22
ba880911a6d54f6ea6d3145081a0e0dd,"homecredit.ru/help/quest/feedback.php",2016-01-01 00:06:12,2

Second df looks like

URL Code
citibank\.ru\/russia\/info\/rus\/contacts_form\.htm 15
citibank\.ru\/russia\/info\/rus\/contacts\.htm  15
gazprombank\.ru\/contacts\/ 15
gazprombank\.ru\/feedback\/ 15
gazprombank\.ru\/additional_office\/    15
homecredit\.ru\/help\/quest\/feedback\.php  15
homecredit\.ru\/offices\/*  15

If I don't have a regex, I use

df1['code'] = df1.url.map(df2.set_index('URL')['Code'])

But I can't do this, because df2.URL is regex. But

df1['code'] = df1['url'].replace(df2['URL'], df2['Code'], regex=True)

doesn't work.

Upvotes: 1

Views: 972

Answers (1)

olegsson
olegsson

Reputation: 103

As per my comment, the pandas.Series.replace() method doesn't allow Series objects as the to_replace and value arguments. Passing lists instead works:

df1['code'] = df1.url.replace(df2.URL.values, df2.Code.values, regex=True)
print df1[['url', 'code']]

produces the following output:

                                                 url  \
0  sberbank.ru/ru/person/promo/10mnl?utm_source=V...   
1  online.rsb.ru/hb/faces/system/login/rslogin.js...   
2  online.rsb.ru/hb/faces/system/login/sms/sms.js...   
3             online.rsb.ru/hb/faces/rs/RSIndex.jspx   
4  online.rsb.ru/hb/faces/rs/payments/PaymentReq....   
5              homecredit.ru/help/quest/feedback.php   

                                                code  
0  sberbank.ru/ru/person/promo/10mnl?utm_source=V...  
1  online.rsb.ru/hb/faces/system/login/rslogin.js...  
2  online.rsb.ru/hb/faces/system/login/sms/sms.js...  
3             online.rsb.ru/hb/faces/rs/RSIndex.jspx  
4  online.rsb.ru/hb/faces/rs/payments/PaymentReq....  
5                                                 15

In answer to your additional comments, you can't get df2.Code in df1.code in rows where df1.url doesn't match any of the regex strings, but you can provide a value (e.g. None) for these cases to be put in the column instead. This is, for example, done by adding the following line:

df1['code'] = df1.apply(lambda x: None if x.code == x.url else x.code, axis=1)

where print df1[['url', 'code']] returns the following:

                                                 url  code
0  sberbank.ru/ru/person/promo/10mnl?utm_source=V...   NaN
1  online.rsb.ru/hb/faces/system/login/rslogin.js...   NaN
2  online.rsb.ru/hb/faces/system/login/sms/sms.js...   NaN
3             online.rsb.ru/hb/faces/rs/RSIndex.jspx   NaN
4  online.rsb.ru/hb/faces/rs/payments/PaymentReq....   NaN
5              homecredit.ru/help/quest/feedback.php  15.0

Upvotes: 1

Related Questions