Reputation: 97
I have a df containing user and url looking like this.
df
User Url
1 http://www.mycompany.com/Overview/Get
2 http://www.mycompany.com/News
3 http://www.mycompany.com/Accountinfo
4 http://www.mycompany.com/Personalinformation/Index
...
I want to add another column page that only takes the second part of the url, so I'd be having it like this.
user url page
1 http://www.mycompany.com/Overview/Get Overview
2 http://www.mycompany.com/News News
3 http://www.mycompany.com/Accountinfo Accountinfo
4 http://www.mycompany.com/Personalinformation/Index Personalinformation
...
My code below is not working.
slashparts = df['url'].split('/')
df['page'] = slashparts[4]
The error I'm getting
AttributeError Traceback (most recent call last)
<ipython-input-23-0350a98a788c> in <module>()
----> 1 slashparts = df['request_url'].split('/')
2 df['page'] = slashparts[1]
~\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
4370 if
self._info_axis._can_hold_identifiers_and_holds_name(name):
4371 return self[name]
-> 4372 return object.__getattribute__(self, name)
4373
4374 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'split'
Upvotes: 3
Views: 1154
Reputation: 294278
I'm attempting to be a little more explicit to handle where http
might be missing and other variations
pat = '(?:https?://)?(?:www\.)?(?:\w+\.\w+\/)([^/]*)'
df.assign(page=df.Url.str.extract(pat, expand=False))
User Url page
0 1 http://www.mycompany.com/Overview/Get Overview
1 2 http://www.mycompany.com/News News
2 3 www.mycompany.com/Accountinfo Accountinfo
3 1 http://www.mycompany.com/Overview/Get Overview
4 2 mycompany.com/News News
5 3 https://www.mycompany.com/Accountinfo Accountinfo
6 4 http://www.mycompany.com/Personalinformation/I... Personalinformation
Upvotes: 2
Reputation: 862681
Use pandas text functions with str
and for select 4.
lists use str[3]
, because python counts from 0
:
df['page'] = df['Url'].str.split('/').str[3]
Or if performance is important use list comprehension
:
df['page'] = [x.split('/')[3] for x in df['Url']]
print (df)
User Url \
0 1 http://www.mycompany.com/Overview/Get
1 2 http://www.mycompany.com/News
2 3 http://www.mycompany.com/Accountinfo
3 4 http://www.mycompany.com/Personalinformation/I...
page
0 Overview
1 News
2 Accountinfo
3 Personalinformation
Upvotes: 4