pandas get a value out of text using regex

i have a text like this:

text = 'Ronald Mayr: A\nBell Kassulke: B\nJacqueline Rupp: A \nAlexander Zeller: C\nValentina Denk: C \nSimon Loidl: A \nElias Jovanovic: B \nStefanie Weninger: B \nFabian Peer: C \nHakim Botros: B\nEmilie Lorentsen: B\n'

I need to get all the names that have ":B" value.. for example Bell Kassulke, Elias Jovanovic

I'm trying something like this

stu = re.findall('\w+.*.: B',text)

but this one gives me list like this:

['Bell Kassulke: B',
 'Simon Loidl: B',
 'Elias Jovanovic: B']

While I only need the names not this whole list. What exactly can I do?

Upvotes: 2

Answers (3)

cherry

Reputation: 356

try this

'(' starts capturing

\w+

matches any word character (equal to [a-zA-Z0-9_])

Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

.*

matches any character (except for line terminators)

Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

')' end of capturing

: B

matches the characters : B literally (case sensitive)

pattern='(\w+.*.): B'
re.findall(pattern,grades)

Upvotes: 1

Wiktor Stribiżew

Reputation: 627101

You can use

^(.*?):\s*B\s*$

See the regex demo

Details

^ - start of a string
(.*?) - Group 1 (the actual value of .findall): any zero or more chars other than line break chars as few as possible
: - a colon
\s*B\s* - a B enclosed with zero or more whitespaces
$ - end of string/

In Pandas, you may use

df['Col name here'].str.findall(r'^(.*?):\s*B\s*$').str.join(',')

Or, if you need a single match per value:

df['Results'] = df['Col name here'].str.extract(r'^(.*?):\s*B\s*$', expand=False)

Upvotes: 2

gtomer

Reputation: 6574

You can add this line of code after your regex:

stu = [s.replace(': B', '') for s in stu]

Upvotes: 0

pandas get a value out of text using regex

Answers (3)

Related Questions