Reputation: 1815
How can I match everything to include special characters between exactly 3 white spaces to the left and exactly 3 white spaces to the right of a colon? Example denoted with W as a white space.
Example match:
\\s\\s\\sdata\\sstuff:\\s\\sfound\\ssome([%$)Data\\sas\\swhiteSpace\\s\\s\\s
data stuff: found some([%$)Data as whiteSpace
Example nonMatch:
\\s\\sdata\sstuff:\\s\\sfound\\sno\\sdatacause\\sno\\s3\\sspaces\\sbefore\\sor\\safter\\s\\s
data stuff: found no datacause no 3 spaces before or after
The intent is to expand this to seperate columns from a single column of a pandas dataframe.
Expected output:
data stuff data stuff 2
found some([%$)Data as whiteSpace if i had more examples for data stuff 2 it would show here
extra random data to add into a outputdf if i had more examples for data stuff 2 it would show here
Original thought was to use something like this but this doesnt quite work.
"(\\s\\s\\s(.*?)\\:\\s\\s(.*?)\\s\\s\\s)"
Upvotes: 0
Views: 1752
Reputation: 38415
Consider this df
col
0 data stuff: found some([%$)Data as whiteSpace 1
Regex1:
df.col.str.extract(':\\s{3}(.*)\s{3}')
would return
0 found some([%$)Data as whiteSpace
Name: col, dtype: object
That is the content between three white spaces before found and three white spaces at the end before 1.
Where as
df.col.str.extract(':\\s{3}(.*?)\s{3}') #note the ? after .*
will return
0 found
Name: col, dtype: object
That is the content between the first and second instances of three white spaces.
If you provide more test cases, it would become clear as to what else do you need the regex to do.
Upvotes: 1
Reputation: 4536
(?:^|[^ ]) (.*?) (?:$|[^ ])
Break it down!
(?:^|[^ ])
- Match anything that is either not a space or is the beginning of a linex (.*?) x
match anything that is between 3 spaces on either side (x's added so spaces don't disapear) (?:$|[^ ])
match anything that is not a space or is at the end of a lineUpvotes: 0