Reputation: 391
I have a series of url
www.domain.com/calendar.php?month=may.2019
www.domain.com/calendar.php?month=april.2019
www.domain.com/calendar.php?month=march.2019
www.domain.com/calendar.php?month=feb.2019
...
...
...
www.domain.com/calendar.php?month=feb.2007
I wanted to extract the year after month.
What I'm looking for
2019
2019
...
...
2007
and save them into another columns
Here's what I have:
data["urls"].str.extract('(?<=month=).*$')
Upvotes: 1
Views: 48
Reputation: 27723
Here, we can also use simple expression without look-arounds, such as:
.+month=.+\.([0-9]{4})
or:
month=.+\.([0-9]{4})
or:
.+month=.+\.(.+)
or:
month=.+\.(.+)
Upvotes: 0
Reputation: 294218
df["urls"].str.extract('(?<=month=).*\.(\d{4})$')
If you can trust that all do have the same pattern, then these should work.
split
df["urls"].str.rsplit('.', 1).str[-1]
df["urls"].str[-4:]
Upvotes: 4