Reputation: 39
I have a Chinese txt file with thousands of sentence lines as following,
…………
I want to combine every two adjoining lines into one line,it should be transformed as:
How can I use Python to finish the combination?
Upvotes: 0
Views: 66
Reputation: 11536
You don't need Python for that, sed
is enough:
$ seq 15 > lines
$ cat lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ sed 'N;s/\n/ /g' lines
1 2
3 4
5 6
7 8
9 10
11 12
13 14
15
According to man sed:
n N Read/append the next line of input into the pattern space.
and
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
And, as sed
execute the given script for each line, the newline character is not included in the pattern space (it would be redundant to include it). So the executed sequence is:
N
: Append the next line to the pattern space, now that we have two lines in the pattern space, they have to be separated by a newline, so we have a newline character in the middle of the pattern spaces/\n/ /
replace the newline character by a spaceUpvotes: 1
Reputation: 414139
A file is an iterator over lines in Python. You could use the itertools' grouper() recipe, to group the lines into pairs:
#!/usr/bin/env python2
from itertools import izip_longest
with open('Chinese.txt') as file:
for line, another in izip_longest(file, file, fillvalue=''):
print line.rstrip('\n'), another,
The comma at the end of the print
statement is the file.softspace
hack, to avoid duplicating newlines.
The code keeps only two lines in the memory and therefore it can support arbitrary large files.
Upvotes: 0
Reputation: 1520
You should iterate on your file like follows:
with open('./chinese.txt') as my_file:
for line in my_file:
try:
print '{} {}'.format(line.strip(), my_file.next())
except StopIteration: # Manage case: number of lines is an odd number
print line
Upvotes: 0
Reputation: 8569
then you could use a list comprehension, like this one:
[ l1 + ' ' + l2 for l1,l2 in zip(lines[::2], lines[1::2]) ]
Note, this means you'll have to have an equal number of lines. so if len(lines)%2==1
then use lines[-1]
to print out/use the last line by itself
Upvotes: 0