Reputation: 81
movies
| Movies | Release Date |
| -------- | -------------- |
| Star Wars: Episode VII - The Force Awakens (2015) | December 16, 2015 |
| Avengers: Endgame (2019 | April 24, 2019 |
I am trying to have a new column and use split to have the year.
import pandas as pd
df = pd.DataFrame({'Movies': ['Star Wars: Episode VII - The Force Awakens (2015)', 'Avengers: Endgame (2019'],
'Release Date': ['December 16, 2015', 'April 24, 2019' ]})
movies["year"]=0
movies["year"]= movies["Release Date"].str.split(",")[1]
movies["year"]
TO BE
| Movies | year |
| -------- | -------------- |
| Star Wars: Episode VII - The Force Awakens (2015) | 2015 |
| Avengers: Endgame (2019) | 2019 |
BUT
> ValueError: Length of values does not match length of index
Upvotes: 2
Views: 49
Reputation: 1388
movies["Release Date"].str.split(",")
returns a series of of the lists returns by split()
movies["Release Date"].str.split(",")[1]
return the second element of this series.
This is obviouly not what you want.
Keep using pandas.str.split
. but then a function that gets the 2nd item of the series rows for example:
movies["Release Date"].str.split(",").map(lambda x: x[1])
Do something different as suggestted by @Tim Bielgeleisen
Upvotes: 0
Reputation: 521629
Using str.extract
we can target the 4 digit year:
df["year"] = df["Release Date"].str.extract(r'\b(\d{4})\b')
Upvotes: 2