Reputation: 127

Need to extract contents depending upon its title using python

I need to extract text depending on the title, let's say in the below code, I need to display Experience field. Like, let's assume I have a text file as ab.text which has data like:

Name: xyz
Experience: 
123 company 2016-2017
567 company 2017-2018
yzx company 2018-2019

Skills:
Python, MachineLearning, Java.

Now i need to read this text file and display only the texts that is under experience field. Note: The order of Name , expereince and skills may vary. I am new to python please help me on this.

Expected Output:

Experience: 
123 company 2016-2017
567 company 2017-2018
yzx company 2018-2019

Upvotes: 3

Answers (3)

Swadhikar

Reputation: 2210

This will do the trick

Code

matches = re.findall('^Experience:.*[(\d+ \w+ \d+\-\d+)\n]+$', text, re.M)
for match in matches:
    print(match.strip())
    print()

Explanation

^Experience

signifies that our match should start with word Experience

[(\d+ \w+ \d+-\d+)\n]+

will match the pattern 123 company 2016-2017 one or more number of times

$

at the end indicates that the pattern ends once when the pattern 123 company 2016-2017 is exhausted

re.M

indicates that our input text is a multilined string and not a single, long text

Upvotes: 1

Andrej Kesely

Reputation: 195468

You could use re module and parse the text with it:

data = '''Name: xyz
Experience:
123 company 2016-2017
567 company 2017-2018
yzx company 2018-2019

Skills:
Python, MachineLearning, Java.'''

import re

#Step 1. Split the string
s = [g.strip() for g in re.split('^(\w+):', data, flags=re.M) if g.strip()]
# s = ['Name', 'xyz', 'Experience', '123 company 2016-2017\n567 company 2017-2018\nyzx company 2018-2019', 'Skills', 'Python, MachineLearning, Java.']

#Step 2. Convert the splitted string to dictionary
d = dict(zip(s[::2], s[1::2]))
# d = {'Name': 'xyz', 'Experience': '123 company 2016-2017\n567 company 2017-2018\nyzx company 2018-2019', 'Skills': 'Python, MachineLearning, Java.'}

print(d['Experience'])

Prints:

123 company 2016-2017
567 company 2017-2018
yzx company 2018-2019

Upvotes: 3

Col Bates - collynomial

Reputation: 662

I think the problem you've set is not super well defined. But based on the example file you provided the below code will work. You should learn something about file i/o, list methods and list comprehensions to understand more the code below. I've tried to structure it in a way that each time you run a line you can investigate what the line does so the code doesn't seem like magic.

f = open('C:/ab.text') # change ot the path of your file
contents = f.read() #read the contents
contents = contents.split('\n') # turn the read object into a list
contents = [x.strip() for x in contents] #remove whitespace from elements
# below we concatentate the list so it starts at the Experience: row
contents = contents[contents.index('Experience:'):] 
# make a list of all the lines containing colons ':'

colon_places = [i for i,x in enumerate(contents) if x.find(':')>0] 

#if there is only one colon it will be at the start from 'Experience:'
if colon_places == [0]:
    contents=  contents
#if there is more than one, we only want to go as far as the second
elif len(colon_places) > 1:
    contents = contents[0:colon_places[1]]

#finally, we throw out the header 'Experience' and any empty rows
Experience = [x for x in contents if x  not in ['Experience:', '']]

I hope it's helpful.

Upvotes: 1

Need to extract contents depending upon its title using python

Answers (3)

Related Questions