Reputation: 4244
I have barely any experience with file r/w in Python, and wanted to ask what the best solution for my particular case is.
I have a tab separated file with the following structure, where each sentence is separated by a blank line:
Roundup NN
: :
Muslim NNP
Brotherhood NNP
vows VBZ
daily JJ
protests NNS
in IN
Egypt NNP
Families NNS
with IN
no DT
information NN
on IN
the DT
whereabouts NN
of IN
loved VBN
ones NNS
are VBP
grief JJ
- :
stricken JJ
. .
The DT
provincial JJ
departments NNS
of IN
supervision NN
and CC
environmental JJ
protection NN
jointly RB
announced VBN
on IN
May NNP
9 CD
that IN
the DT
supervisory JJ
department NN
will MD
question VB
and CC
criticize VB
mayors NNS
who WP
fail VBP
to TO
curb VB
pollution NN
. .
(...)
I want to append to the non-empty lines of this file, first a tab and then a given string.
For each line, the string to append will depend on the value stored in lab_pred_tags
in the code below. For each iteration of the for
loop, lab_pred_tags
has the same length as the number of lines as its corresponding sentence in the text file. i.e., in the example, the lengths of lab_pred_tags
for the 3 for
loop iterations are 9, 15, and 12.
For the first for
loop iteration, lab_pred_tags
contains the list
: ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']
# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
lab = lab[:length]
lab_pred = lab_pred[:length]
# Convert lab_pred from a sequence of numbers to a sequence of strings
lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
# Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
# Keep in mind that I will need to skip a line everytime a new for loop iteration is started
For the example, the desired output file is:
Roundup NN O
: : O
Muslim NNP B-ORG
Brotherhood NNP I-ORG
vows VBZ O
daily JJ O
protests NNS O
in IN O
Egypt NNP B-GPE
Families NNS O
with IN O
no DT O
information NN O
on IN O
the DT O
whereabouts NN O
of IN O
loved VBN O
ones NNS O
are VBP O
grief JJ O
- : O
stricken JJ O
. . O
The DT O
provincial JJ O
departments NNS O
of IN O
supervision NN O
and CC O
environmental JJ O
protection NN O
jointly RB O
announced VBN O
on IN O
May NNP O
9 CD O
that IN O
the DT O
supervisory JJ O
department NN O
will MD O
question VB O
and CC O
criticize VB O
mayors NNS O
who WP O
fail VBP O
to TO O
curb VB O
pollution NN O
. . O
What is the best solution for this?
Upvotes: 0
Views: 147
Reputation: 560
For the testing purpose, I modified the lab_pred_tags list. Here is my solution:
lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
'O', 'O', 'B-GPE', 'O']
index = 0
with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
open("PATH_TO_NEW_FILE", "w") as lab_file_2:
lab_file_list = lab_file.readlines()
for lab_file_list_element in lab_file_list:
if lab_file_list_element == "\n":
index = 0
lab_file_2.write("\n")
else:
new_line_element = lab_file_list_element.replace(
"\n", ' ' + lab_pred_tags[index] + "\n"
)
index += 1
lab_file_2.write(new_line_element)
Upvotes: 1