star_it8293
star_it8293

Reputation: 439

Remove the datetime string from each row in a Dataframe

I have a dataframe that looks like this:

   Film      Description       
0  Batman    Viewed in 2021-10-04T14:30:31Z City Hall, London
1  Superman  Aired 2012-01-04R11:01:10Z in the USA first
2  Hulk      2010-07-04S07:22:02Z Still being produced

I want to remove the date-time from each row in the 'Description' column, to look like this:

    Film      Description      
0   Batman    Viewed in City Hall, London
1   Superman  Aired in the USA first
2   Hulk      Still being produced

I have attempted this string regex:

df['Description'] = df['Description '].str.replace(r'\^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})Z', '')

Upvotes: 2

Views: 573

Answers (3)

wwnde
wwnde

Reputation: 26676

Use str.replace to replace;

Any non white space before : OR Any non white after : OR : itself.

    df['Description']=df['Description'].str.replace('\S+(?=[:])|(?<=[:])\S+|[:]','')
print(df)



       Film             Description
0    Batman  Viewed in  City Hall, London
1  Superman       Aired  in the USA first
2      Hulk          Still being produced

Upvotes: 1

drmcchamburgers
drmcchamburgers

Reputation: 78

i haven't gone as far as replicating your dataframe, but you regex is not going to work with the carrot ^ will lock the match to the beginning of the string, and you have a 'T' in there, which will only match on one of those description.

try:

(\d{4}-\d{2}-\d{2}[TSR]\d{2}:\d{2}:\d{2})Z

Upvotes: 1

Ryszard Czech
Ryszard Czech

Reputation: 18611

\^ matches a caret symbol.

Other than T, I see R and S in the datetime stamps, they must be added.

Use

\s*\b\d{4}-\d{2}-\d{2}[TRS]\d{2}:\d{2}:\d{2}Z\b

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  [TRS]                    any character of: 'T', 'R', 'S'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
  Z                        'Z'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

Upvotes: 3

Related Questions