Mahhos
Mahhos

Reputation: 101

Extracting information from text in python

I am new to the text mining. I have a CSV file. I need to go through each line and extract some information then write them into another CSV file. I am looking for specific information which I have in a dictionary. Consider below sentence:

"the application version is 1.8.2 and the variable skt.len passes the required information. file ReadMe.txt has the specifications."

My dictionary is: ["application version", "variable", "file"]

I need to extract:

What is the best way to extract such information from text? I am playing with NLTK and StanfordCoreNLP features. But, I could not extract the information yet. I am thinking to use regex to extract the application version. Any idea?

PS: I know that this may make the task more complicated. But, sentences in each line of the CSV file may have different structures. For example: "application version" in one line, may be "app version" in another line. Or "file" in one line may be "filename" in another line.

Upvotes: 0

Views: 1303

Answers (1)

Sourabh
Sourabh

Reputation: 83

I use R and below is one of the way (not the best one but just to show how it works) to extract value of variable:

>> str_extract(text, '(?<=variable\\s)(\\w+)(.)?(\\w+)?')

here text is the entire string which you have shared. This gives me an output

>> skt.len

I am sure there are similar functions in Python to get this done and get the output in desired format.

Upvotes: 1

Related Questions