Reputation:
I have a 'leads' dataset, which has 'ref_url' column. This column includes links, which I want to parse and get only a particular part of them. I need to replace old values with parsed values.
This is how old values look like:
https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia
This is how I want them to look like:
https://regalia-deyaar.sales-centre.properties/
Here is what I did:
from urllib.parse import urlparse
def parsing_url(Series):
for rows in Series:
parsed_url = urlparse(rows)
parsed=(f"{parsed_url.scheme}://{parsed_url.netloc}{parsed_url.path}")
rows=parsed
leads['ref_url'].apply(parsing_url)
However, this didn't work. It returned only NaN values. Can you help me, please?
Upvotes: 0
Views: 186
Reputation: 470
I assume you are using pandas, you can use lambda and split string by "?"
df = pd.DataFrame({
'url': ["https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia"
, "https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia"
, "https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia"]
})
# split string by "?" and get the first, assume main url will not contains "?"
df['url']=df['url'].apply(lambda x: x.split("?",1)[0])
Upvotes: 0