Reputation: 337
I have a pandas DataFrame named full_list
with a string-variable column named domains
. Part of a snip shown here
domains
0 naturalhealth365.com
1 truththeory.com
2 themillenniumreport.com
3 https://www.cernovich.com
4 https://www.christianpost.com
5 http://evolutionnews.org
6 http://www.greenmedinfo.com
7 http://www.magapill.com8
8 https://needtoknow.news
I need to remove the https:// OR http:// from the website names.
I checked multiple pandas post on SO dealing with vaguely similar issues and I have tried all of these methods:
full_list['domains'] = full_list['domains'].apply(lambda x: x.lstrip('http://'))
but that erronoeusly removes the letters t, h and p as well i.e. "truththeory.com" (index 1) becomes "uththeory.com"
full_list['domains'] = full_list['domains'].replace(('http://', ''))
and this makes no changes to the strings AT ALL. Like before and after the line run, the values in domains
stay the same
full_list['domains'] = full_list['domains'].str.replace(('http://', ''))
gives the error replace() missing 1 required positional argument: 'repl'
full_list['domains'] = full_list['domains'].str.lsplit('//', n=1).str.get(1)
makes the first 3 rows (index 0, 1, 2) nan
For the world of me, I am unable to see what is it that I am doing wrong. Any help is appreciated.
Upvotes: 2
Views: 155
Reputation: 71610
Try str.replace
with regex like the following:
>>> df['domains'].str.replace('http(s|)://', '')
0 naturalhealth365.com
1 truththeory.com
2 themillenniumreport.com
3 www.cernovich.com
4 www.christianpost.com
5 evolutionnews.org
6 www.greenmedinfo.com
7 www.magapill.com8
8 needtoknow.news
Name: domains, dtype: object
>>>
Upvotes: 1
Reputation: 863281
Use Series.str.replace
with regex ^
for start of string and [s]*
for optional s
:
df['domains'] = df['domains'].str.replace(r'^http[s]*://', '', regex=True)
print (df)
domains
0 naturalhealth365.com
1 truththeory.com
2 themillenniumreport.com
3 www.cernovich.com
4 www.christianpost.com
5 evolutionnews.org
6 www.greenmedinfo.com
7 www.magapill.com8
8 needtoknow.news
Upvotes: 1