Reputation: 137
I have a markdown file like below:
#2016-12-24
| 单词 | 解释 | 例句 |
| --------- | -------- | --------- |
|**accelerator;**| - | - |
|**compass**| - | - |
|**wheels**| - | - |
|**fabulous**| - | - |
|**sweeping**| - | - |
|**prospect**| - | - |
|**pumpkin**| - | - |
|**trolley**| - | - |
|snapped,**| - | - |
|tip| - | - |
|lap| - | - |
|tether.| - | - |
|damp| - | - |
|triumphant| - | - |
|sarcastic| - | - |
|missed out| - | - |
|sidekick| - | - |
|considerable| - | - |
|Willow.| - | - |
|eagle.| - | - |
|considerably.| - | - |
|flat.| - | - |
|feast| - | - |
|scramble| - | - |
|turned up| - | - |
|rounded off| - | - |
|rat| - | - |
|resembled| - | - |
|By the time she had clambered back into the car,| - | - |
|By the time she had clambered back into the car, they were running very late,| - | - |
|wheeled his trolley| - | - |
|barrier,| - | - |
|bounced| - | - |
|in blazes| - | - |
|clutching| - | - |
|sealed| - | - |
|stunned.| - | - |
|‘We’re stuck,| - | - |
|marched off| - | - |
|accelerator| - | - |
|and the prospect of seeing Fred and George’s jealous faces| - | - |
|protest.| - | - |
|in protest.| - | - |
|horizon,| - | - |
|knuckles| - | - |
|metal| - | - |
|thick| - | - |
|reached the end of its tether.| - | - |
|Artefacts| - | - |
|blurted out.| - | - |
|gaped| - | - |
|I will be writing to both your families tonight.| - | - |
|‘Can you believe our luck, though?’| - | - |
|‘Skip the lecture,’| - | - |
|people’ll be talking about that one for years!’| - | - |
|nudged| - | - |
|‘I know I shouldn’t’ve enjoyed that or anything, but –’| - | - |
|dashed| - | - |
I'd like to extract the sentences like:
I tried to do like this in regex101 website, but actually each time it match all.
Anyone can help me please?
Upvotes: 2
Views: 326
Reputation: 10476
Try this:
^\|[^\w\|]*(\w+\s+(?=\w+)[^\|]*)
^\|
matches if the the line starts with a pipe (|)[^\w\|]*
grab anything which not in a-z0-9 and |\w+\s+
makes sure it is followed by a word and one or more
white space(?=\w+)
Then checks if it has more words to follow[^\|]*
if previous conditions found then grabs anything until the
next pipe |For each match, group 1 contains the sentence you desire
Upvotes: 1
Reputation: 43199
You could come up with:
^\| # start of line, followed by |
( # capture the "words"
(?:[‘\w]+ # a non-capturing group and at least one of \w or ‘
(?:[^|\w\n\r]+ # followed by NOT one of these
| # or
(?=\|) # make sure, there's a | straight ahead
)
){2,}) # repeat the construct at least 2 times
\|
See a demo on regex101.com (and mind the modifiers!).
This will capture at least two consecutive words, if you need more, put another number in the {}
parentheses.
Upvotes: 0