Reputation: 1186
I have a pandas DataFrame like this:
idx name
1 "NM_014855.2(AP5Z1):c.80_83delGGATinsTGCTGTAAACTGTAACTGTAAA (p.Arg27_Ala362delinsLeuLeuTer)"
2 "NM_014630.2(ZNF592):c.3136G>A (p.Gly1046Arg)"
3 "NM_000410.3(HFE):c.892+48G>A"
4 "NC_000014.9:g.(31394019_31414809)_(31654321_31655889)del"
I need to extract whatever follows the ':'
character, until any of the following:
" ("
"del"
{end of string}
I have tried the following:
df.str.extract(r"\):(.*) \(|\n")
But it doesn't work for all the cases.
How can I properly specify the condition I need?
Upvotes: 0
Views: 3186
Reputation: 7095
Use a lazy match *?
to minimize how much the .*
will capture, then specify the stop conditions you're looking for:
df.str.extract(r":(.*?)(?:\(|del|$)")
Regular expressions normally match the longest possible string, but ?
switches it to match the shortest possible string.
Upvotes: 2