Reputation: 439
I have a dataframe that looks like this:
Film Description
0 Batman Viewed in 2021-10-04T14:30:31Z City Hall, London
1 Superman Aired 2012-01-04R11:01:10Z in the USA first
2 Hulk 2010-07-04S07:22:02Z Still being produced
I want to remove the date-time from each row in the 'Description' column, to look like this:
Film Description
0 Batman Viewed in City Hall, London
1 Superman Aired in the USA first
2 Hulk Still being produced
I have attempted this string regex:
df['Description'] = df['Description '].str.replace(r'\^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})Z', '')
Upvotes: 2
Views: 573
Reputation: 26676
Use str.replace
to replace;
Any non white space before :
OR Any non white after :
OR :
itself.
df['Description']=df['Description'].str.replace('\S+(?=[:])|(?<=[:])\S+|[:]','')
print(df)
Film Description
0 Batman Viewed in City Hall, London
1 Superman Aired in the USA first
2 Hulk Still being produced
Upvotes: 1
Reputation: 78
i haven't gone as far as replicating your dataframe, but you regex is not going to work with the carrot ^ will lock the match to the beginning of the string, and you have a 'T' in there, which will only match on one of those description.
try:
(\d{4}-\d{2}-\d{2}[TSR]\d{2}:\d{2}:\d{2})Z
Upvotes: 1
Reputation: 18611
\^
matches a caret symbol.
Other than T
, I see R
and S
in the datetime stamps, they must be added.
Use
\s*\b\d{4}-\d{2}-\d{2}[TRS]\d{2}:\d{2}:\d{2}Z\b
See proof.
EXPLANATION
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
[TRS] any character of: 'T', 'R', 'S'
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
Z 'Z'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
Upvotes: 3