johnnyb
johnnyb

Reputation: 1815

pandas regex match all items between two sets of white spaces

How can I match everything to include special characters between exactly 3 white spaces to the left and exactly 3 white spaces to the right of a colon? Example denoted with W as a white space.

Example match:

\\s\\s\\sdata\\sstuff:\\s\\sfound\\ssome([%$)Data\\sas\\swhiteSpace\\s\\s\\s
   data stuff:  found some([%$)Data as whiteSpace   

Example nonMatch:

\\s\\sdata\sstuff:\\s\\sfound\\sno\\sdatacause\\sno\\s3\\sspaces\\sbefore\\sor\\safter\\s\\s
  data stuff:  found no datacause no 3 spaces before or after   

The intent is to expand this to seperate columns from a single column of a pandas dataframe.

Expected output:

data stuff                                data stuff 2
found some([%$)Data as whiteSpace         if i had more examples for data stuff 2 it would show here
extra random data to add into a outputdf  if i had more examples for data stuff 2 it would show here

Original thought was to use something like this but this doesnt quite work.

"(\\s\\s\\s(.*?)\\:\\s\\s(.*?)\\s\\s\\s)"

Upvotes: 0

Views: 1752

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Consider this df

    col
0   data stuff:   found   some([%$)Data as whiteSpace   1

Regex1:

df.col.str.extract(':\\s{3}(.*)\s{3}')

would return

0    found   some([%$)Data as whiteSpace
Name: col, dtype: object

That is the content between three white spaces before found and three white spaces at the end before 1.

Where as

df.col.str.extract(':\\s{3}(.*?)\s{3}') #note the ? after .*

will return

0    found
Name: col, dtype: object

That is the content between the first and second instances of three white spaces.

If you provide more test cases, it would become clear as to what else do you need the regex to do.

Upvotes: 1

Aaron Brock
Aaron Brock

Reputation: 4536

(?:^|[^ ])   (.*?)   (?:$|[^ ])

Break it down!

  • (?:^|[^ ]) - Match anything that is either not a space or is the beginning of a line
  • x (.*?) x match anything that is between 3 spaces on either side (x's added so spaces don't disapear)
  • (?:$|[^ ]) match anything that is not a space or is at the end of a line

Example in regexr

Upvotes: 0

Related Questions