Mark
Mark

Reputation: 1081

How to properly use str.replace() with Pandas DataFrame

I have a DataFrame like so

             Year        Player
46  Jan. 17, 1971  Chuck Howley
47  Jan. 11, 1970    Len Dawson
48  Jan. 12, 1969    Joe Namath
49  Jan. 14, 1968    Bart Starr
50  Jan. 15, 1967    Bart Starr

and I only want the year to populate df_MVPs['Year']. My current method is

df_MVPs['Year'] = df_MVPs['Year'].str.replace(df_MVPs['Year'][:7], '')

but this causes an error to occur. Is there a way to do this more simply?

EDIT: I want my DataFrame to look like:

    Year        Player
46  1971  Chuck Howley
47  1970    Len Dawson
48  1969    Joe Namath
49  1968    Bart Starr
50  1967    Bart Starr

Upvotes: 2

Views: 177

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

I'd use .str.extract() method instead:

In [10]: df
Out[10]:
             Year        Player
46  Jan. 17, 1971  Chuck Howley
47  Jan. 11, 1970    Len Dawson
48  Jan. 12, 1969    Joe Namath
49  Jan. 14, 1968    Bart Starr
50  Jan. 15, 1967    Bart Starr

In [11]: df.Year.str.extract('.*(\d{4})$', expand=True)
Out[11]:
       0
46  1971
47  1970
48  1969
49  1968
50  1967

but you can also use .str.replace():

In [13]: df.Year.str.replace('.*(\d{4})$', r'\1')
Out[13]:
46    1971
47    1970
48    1969
49    1968
50    1967
Name: Year, dtype: object

Here is a link which explains the .*(\d{4})$ RegEx (Regular Expresiion)

Upvotes: 0

Alexander
Alexander

Reputation: 109726

You could take the last four characters of the string:

df_MVPs['Year'] = df_MVPs['Year'].str[-4:]

>>> df_MVPs
    Year        Player
46  1971  Chuck Howley
47  1970    Len Dawson
48  1969    Joe Namath
49  1968    Bart Starr
50  1967    Bart Starr

Upvotes: 1

Kartik
Kartik

Reputation: 8703

Aw man, convert to a datetime then get the year:

df_MVPs['Year'] = pd.to_datetime(df_MVPs['Year'], format='%b. %d, %Y').dt.year

Upvotes: 5

Related Questions