Reputation: 101
I am new to the text mining. I have a CSV file. I need to go through each line and extract some information then write them into another CSV file. I am looking for specific information which I have in a dictionary. Consider below sentence:
"the application version is 1.8.2 and the variable skt.len passes the required information. file ReadMe.txt has the specifications."
My dictionary is: ["application version", "variable", "file"]
I need to extract:
What is the best way to extract such information from text? I am playing with NLTK and StanfordCoreNLP features. But, I could not extract the information yet. I am thinking to use regex to extract the application version. Any idea?
PS: I know that this may make the task more complicated. But, sentences in each line of the CSV file may have different structures. For example: "application version" in one line, may be "app version" in another line. Or "file" in one line may be "filename" in another line.
Upvotes: 0
Views: 1303
Reputation: 83
I use R and below is one of the way (not the best one but just to show how it works) to extract value of variable:
>> str_extract(text, '(?<=variable\\s)(\\w+)(.)?(\\w+)?')
here text is the entire string which you have shared. This gives me an output
>> skt.len
I am sure there are similar functions in Python to get this done and get the output in desired format.
Upvotes: 1