sullymon54
sullymon54

Reputation: 47

how do i find text between upper case characters in data frame rows?

I have a DataFrame with character strings of upper and lower case values and I need to extract only the lower case values between strings of 3 upper case values.

I'm using python and pandas to do this but have been unsuccessful. This is what the data looks like:

afklajrwouoivWERvalueineedREWkfjdsl

Upvotes: 0

Views: 183

Answers (2)

vlemaistre
vlemaistre

Reputation: 3331

You can also use the re package with the same regex :

import re

re.search('[A-Z]{3}(.+?)[A-Z]{3}', s).group()[3:-3]

Output :

valueineed

If there are several occurences you should instead use :

matches = re.finditer('[A-Z]{3}(.+?)[A-Z]{3}',s)
results = [match.group(1) for match in matches]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153500

Let's try this:

df = pd.DataFrame({'text':['afklajrwouoivWERvalueineedREWkfjdsl']}, index=[0])

df['text'].str.extract('[A-Z]{3}(.+?)[A-Z]{3}')

Output:

valueineed

Note, this gets all characters between 3 uppercased letters.

Upvotes: 2

Related Questions