Reputation: 1081
I have a DataFrame like so
Year Player
46 Jan. 17, 1971 Chuck Howley
47 Jan. 11, 1970 Len Dawson
48 Jan. 12, 1969 Joe Namath
49 Jan. 14, 1968 Bart Starr
50 Jan. 15, 1967 Bart Starr
and I only want the year to populate df_MVPs['Year']
. My current method is
df_MVPs['Year'] = df_MVPs['Year'].str.replace(df_MVPs['Year'][:7], '')
but this causes an error to occur. Is there a way to do this more simply?
EDIT: I want my DataFrame to look like:
Year Player
46 1971 Chuck Howley
47 1970 Len Dawson
48 1969 Joe Namath
49 1968 Bart Starr
50 1967 Bart Starr
Upvotes: 2
Views: 177
Reputation: 210982
I'd use .str.extract()
method instead:
In [10]: df
Out[10]:
Year Player
46 Jan. 17, 1971 Chuck Howley
47 Jan. 11, 1970 Len Dawson
48 Jan. 12, 1969 Joe Namath
49 Jan. 14, 1968 Bart Starr
50 Jan. 15, 1967 Bart Starr
In [11]: df.Year.str.extract('.*(\d{4})$', expand=True)
Out[11]:
0
46 1971
47 1970
48 1969
49 1968
50 1967
but you can also use .str.replace()
:
In [13]: df.Year.str.replace('.*(\d{4})$', r'\1')
Out[13]:
46 1971
47 1970
48 1969
49 1968
50 1967
Name: Year, dtype: object
Here is a link which explains the .*(\d{4})$
RegEx (Regular Expresiion)
Upvotes: 0
Reputation: 109726
You could take the last four characters of the string:
df_MVPs['Year'] = df_MVPs['Year'].str[-4:]
>>> df_MVPs
Year Player
46 1971 Chuck Howley
47 1970 Len Dawson
48 1969 Joe Namath
49 1968 Bart Starr
50 1967 Bart Starr
Upvotes: 1
Reputation: 8703
Aw man, convert to a datetime then get the year:
df_MVPs['Year'] = pd.to_datetime(df_MVPs['Year'], format='%b. %d, %Y').dt.year
Upvotes: 5