Reputation: 103
I'm trying to analyze a text to find all the 'NN' and 'nnp', so far the code works well, but when I save the output to a CSV file I haven't been able to get the format I want. which is have the - Word, Tag, Question Analyzed-
this is the code:
training_set = []
text = 'I want to analized this text'
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
result= [(word, tag) for word, tag in tagged if tag in ('NN', 'NNP')]
for i in result:
training_set.append(i)
training_set.append([text])
print(training_set)
listFile2 = open('sample.csv', 'w', newline='')
writer2 = csv.writer(listFile2,quoting=csv.QUOTE_ALL, lineterminator='\n', delimiter=',')
for item in training_set:
writer2.writerow(item)
The outcome is the following:
Any idea how can I keep all the information within the same line. like this:
I have change the code and using two lists and then use Zip to add both to the CSV file, this seems to work however, all close in "" and ()
training_set = []
question = []
text = 'I want to analyzed this text'
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
result= [(word, tag) for word, tag in tagged if tag in ('NN', 'NNP')]
for i in result:
training_set.append(i)
question.append([text])
listFile2 = open('sample.csv', 'w', newline='')
writer2 = csv.writer(listFile2,quoting=csv.QUOTE_ALL, lineterminator='\n', delimiter=',')
for item in zip(training_set, question):
writer2.writerow(item)
Result:
Upvotes: 0
Views: 672
Reputation: 1614
You can try something like this to get your data in the desired format, before writing it to csv:
[tag + (text,) for tag in result]
OUTPUT:
[('text', 'NN', 'I want to analyze this text')]
It will essentially give you a list of tuples in the format you need, which you can then write to your csv.
Upvotes: 1