Reputation: 21
I would like to replace sequence of numbers in file with some other sequence number. for example I want the code find :
5723
5724
5725
.
.
in the file and replace it with
1
2
3
.
.
the format of file is like this :
5723 1 4 0.0530 40.8469574826 23.6497161096 71.2721134368 # hc
5724 1 4 0.0530 41.2184192051 22.0657965663 70.7655969235 # hc
5725 1 4 0.0530 40.1209834536 22.2320441560 72.1100610464 # hc
5726 1 2 0.0390 38.2072673529 21.5636299564 70.4226801302 # ni
5727 1 3 0.0080 39.1491515464 22.7414447024 70.1836001683 # c1
5728 1 4 0.0530 38.6092690356 23.6286807105 70.4379331882 # hc
5729 1 5 -0.1060 39.4744610200 22.9631667398 68.7099315672 # c
5730 1 4 0.0530 39.7733681662 22.0164196098 68.2561710623 # hc
5731 1 4 0.0530 40.3997078786 23.5957910115 68.6602988667 # hc
5732 1 6 -0.1768 37.4127695738 20.7445960448 69.5033013922 # c5
5733 1 7 0.1268 37.5907142 20.8480311755 68.4090824525 # h
I've written this cod to do this but it just replace the first , how can I correct this code ?
import os
import sys
import fileinput
masir = os.curdir + '\\test\\'
input = open('poly-IL9.data', 'r')
output = open('out.data', 'w')
range1 = range(5722,13193)
range2 = range(1,7472)
for i in range(len(x1)):
for j in range(len(y1)):
x = str(range1[i])
y = str(range2[j])
clean = input.read().replace(x,y)
output.write(clean)
Upvotes: 1
Views: 1659
Reputation: 107297
First of all open your file with with
statement. instead of opening the file without closing.
The with statement is used to wrap the execution of a block with methods defined by a context manager.
Read more about the with
statement and its usage advantage.
All you need here is loop over your file and split the lines and replace the first element with the number of line :
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
for i,line in enumerate(inp,1):
out.write(' '.join([str(i)]+line.split()[1:])+'\n')
You can use enumerate
to loop over your file-object to preserve the indices.
Also as an alternative way you can use csv
module for opening the file to refuse of splitting the lines.
import csv
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out:
spamreader = csv.reader(csvfile, delimiter=' ')
for i,row in enumerate(spamreader):
out.write(' '.join([str(i)]+line[1:])+'\n')
Note if your file is separated with other whitespaces or mix of them you can use re.split()
function to split your file with regex :
import re
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
for i,line in enumerate(inp,1):
out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n')
Upvotes: 1
Reputation: 76927
If you want to work on data, you want to consider using Pandas library
And, here's on way to do it in pandas
Read the csv file using pd.read_csv
In [4]: df = pd.read_csv('temp.csv')
In [5]: df
Out[5]:
b c d e f g
5723 1 4 0.0530 40.846957 23.649716 71.272113
5724 1 4 0.0530 41.218419 22.065797 70.765597
5725 1 4 0.0530 40.120983 22.232044 72.110061
5726 1 2 0.0390 38.207267 21.563630 70.422680
5727 1 3 0.0080 39.149152 22.741445 70.183600
5728 1 4 0.0530 38.609269 23.628681 70.437933
5729 1 5 -0.1060 39.474461 22.963167 68.709932
5730 1 4 0.0530 39.773368 22.016420 68.256171
5731 1 4 0.0530 40.399708 23.595791 68.660299
5732 1 6 -0.1768 37.412770 20.744596 69.503301
5733 1 7 0.1268 37.590714 20.848031 68.409082
Use reset_index(drop=True)
to reset the index order. Here the index starts from 0
In [6]: df.reset_index(drop=True)
Out[6]:
b c d e f g
0 1 4 0.0530 40.846957 23.649716 71.272113
1 1 4 0.0530 41.218419 22.065797 70.765597
2 1 4 0.0530 40.120983 22.232044 72.110061
3 1 2 0.0390 38.207267 21.563630 70.422680
4 1 3 0.0080 39.149152 22.741445 70.183600
5 1 4 0.0530 38.609269 23.628681 70.437933
6 1 5 -0.1060 39.474461 22.963167 68.709932
7 1 4 0.0530 39.773368 22.016420 68.256171
8 1 4 0.0530 40.399708 23.595791 68.660299
9 1 6 -0.1768 37.412770 20.744596 69.503301
10 1 7 0.1268 37.590714 20.848031 68.409082
You could also construct your unique index starting from 1
like
In [7]: df.set_index(np.arange(1, len(df)+1))
Out[7]:
b c d e f g
1 1 4 0.0530 40.846957 23.649716 71.272113
2 1 4 0.0530 41.218419 22.065797 70.765597
3 1 4 0.0530 40.120983 22.232044 72.110061
4 1 2 0.0390 38.207267 21.563630 70.422680
5 1 3 0.0080 39.149152 22.741445 70.183600
6 1 4 0.0530 38.609269 23.628681 70.437933
7 1 5 -0.1060 39.474461 22.963167 68.709932
8 1 4 0.0530 39.773368 22.016420 68.256171
9 1 4 0.0530 40.399708 23.595791 68.660299
10 1 6 -0.1768 37.412770 20.744596 69.503301
11 1 7 0.1268 37.590714 20.848031 68.409082
Note: There will be simpler ways to just modify the file. However, if you want to process, analyze the data - using pandas will make your life easier.
Upvotes: 0
Reputation: 1779
The read()
method in clean = input.read().replace(x,y
) is reading the entire file at once, so it makes sense that only one replacement is made. Try readline()
or the preferred for line in file:
to process the file line by line.
Upvotes: 0