Nev1111
Nev1111

Reputation: 1049

Pandas extract substring

I have a large text file with strings contained in a single column (["string"])like in the example below:

    string
0   7 ABC MAGAZINE                                          51               09/14/2000 09/14/2000                .00
1   ABC Magazine                      970-663-4007                               0    00/00/0000                .00

My goal is to extract those line items which contain one sequence that resembles the format of a date "mm/dd/yyyy" into a separate dataframe.

    result
0   ABC Magazine                      970-663-4007                               0    00/00/0000                .00

I tried using regex, but both lines got selected instead of just one. How can I avoid this?

What I tried:

df_['result']=df['string'].str.extract('(.*\d\d/\\d\d/\\d\d\d\d.*)')

Upvotes: 1

Views: 41

Answers (1)

mozway
mozway

Reputation: 260300

You can use a regex with str.count to ensure having a single match:

regex = '\d\d/\d\d/\d{4}'

out = df[df['string'].str.count(regex).eq(1)]

output:

                                              string
1  ABC Magazine                      970-663-4007...

Upvotes: 2

Related Questions