farshid
farshid

Reputation: 21

Python: find and replace sequence of numbers in a file

I would like to replace sequence of numbers in file with some other sequence number. for example I want the code find :

5723
5724
5725
.
.

in the file and replace it with

1
2
3
.
.

the format of file is like this :

    5723    1   4  0.0530  40.8469574826  23.6497161096  71.2721134368  # hc
    5724    1   4  0.0530  41.2184192051  22.0657965663  70.7655969235  # hc
    5725    1   4  0.0530  40.1209834536  22.2320441560  72.1100610464  # hc
    5726    1   2  0.0390  38.2072673529  21.5636299564  70.4226801302  # ni
    5727    1   3  0.0080  39.1491515464  22.7414447024  70.1836001683  # c1
    5728    1   4  0.0530  38.6092690356  23.6286807105  70.4379331882  # hc
    5729    1   5 -0.1060  39.4744610200  22.9631667398  68.7099315672  # c
    5730    1   4  0.0530  39.7733681662  22.0164196098  68.2561710623  # hc
    5731    1   4  0.0530  40.3997078786  23.5957910115  68.6602988667  # hc
    5732    1   6 -0.1768  37.4127695738  20.7445960448  69.5033013922  # c5
    5733    1   7  0.1268  37.5907142     20.8480311755  68.4090824525  # h

I've written this cod to do this but it just replace the first , how can I correct this code ?

import os
import sys
import fileinput

masir = os.curdir + '\\test\\'
input  = open('poly-IL9.data', 'r')
output = open('out.data', 'w')
range1 = range(5722,13193)
range2 = range(1,7472)


for i in range(len(x1)):
    for j in range(len(y1)):
        x = str(range1[i])
        y = str(range2[j])
        clean = input.read().replace(x,y)
        output.write(clean)

Upvotes: 1

Views: 1659

Answers (3)

Kasravnd
Kasravnd

Reputation: 107297

First of all open your file with with statement. instead of opening the file without closing.

The with statement is used to wrap the execution of a block with methods defined by a context manager.

Read more about the with statement and its usage advantage.

All you need here is loop over your file and split the lines and replace the first element with the number of line :

with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
    for i,line in enumerate(inp,1):
       out.write(' '.join([str(i)]+line.split()[1:])+'\n')

You can use enumerate to loop over your file-object to preserve the indices.

Also as an alternative way you can use csv module for opening the file to refuse of splitting the lines.

import csv
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out:
    spamreader = csv.reader(csvfile, delimiter=' ')
    for i,row in enumerate(spamreader):
        out.write(' '.join([str(i)]+line[1:])+'\n')

Note if your file is separated with other whitespaces or mix of them you can use re.split() function to split your file with regex :

import re
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
    for i,line in enumerate(inp,1):
       out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n')

Upvotes: 1

Zero
Zero

Reputation: 76927

If you want to work on data, you want to consider using Pandas library

And, here's on way to do it in pandas

Read the csv file using pd.read_csv

In [4]: df = pd.read_csv('temp.csv')

In [5]: df
Out[5]:
      b  c       d          e          f          g
5723  1  4  0.0530  40.846957  23.649716  71.272113
5724  1  4  0.0530  41.218419  22.065797  70.765597
5725  1  4  0.0530  40.120983  22.232044  72.110061
5726  1  2  0.0390  38.207267  21.563630  70.422680
5727  1  3  0.0080  39.149152  22.741445  70.183600
5728  1  4  0.0530  38.609269  23.628681  70.437933
5729  1  5 -0.1060  39.474461  22.963167  68.709932
5730  1  4  0.0530  39.773368  22.016420  68.256171
5731  1  4  0.0530  40.399708  23.595791  68.660299
5732  1  6 -0.1768  37.412770  20.744596  69.503301
5733  1  7  0.1268  37.590714  20.848031  68.409082

Use reset_index(drop=True) to reset the index order. Here the index starts from 0

In [6]: df.reset_index(drop=True)
Out[6]:
    b  c       d          e          f          g
0   1  4  0.0530  40.846957  23.649716  71.272113
1   1  4  0.0530  41.218419  22.065797  70.765597
2   1  4  0.0530  40.120983  22.232044  72.110061
3   1  2  0.0390  38.207267  21.563630  70.422680
4   1  3  0.0080  39.149152  22.741445  70.183600
5   1  4  0.0530  38.609269  23.628681  70.437933
6   1  5 -0.1060  39.474461  22.963167  68.709932
7   1  4  0.0530  39.773368  22.016420  68.256171
8   1  4  0.0530  40.399708  23.595791  68.660299
9   1  6 -0.1768  37.412770  20.744596  69.503301
10  1  7  0.1268  37.590714  20.848031  68.409082

You could also construct your unique index starting from 1 like

In [7]: df.set_index(np.arange(1, len(df)+1))
Out[7]:
    b  c       d          e          f          g
1   1  4  0.0530  40.846957  23.649716  71.272113
2   1  4  0.0530  41.218419  22.065797  70.765597
3   1  4  0.0530  40.120983  22.232044  72.110061
4   1  2  0.0390  38.207267  21.563630  70.422680
5   1  3  0.0080  39.149152  22.741445  70.183600
6   1  4  0.0530  38.609269  23.628681  70.437933
7   1  5 -0.1060  39.474461  22.963167  68.709932
8   1  4  0.0530  39.773368  22.016420  68.256171
9   1  4  0.0530  40.399708  23.595791  68.660299
10  1  6 -0.1768  37.412770  20.744596  69.503301
11  1  7  0.1268  37.590714  20.848031  68.409082

Note: There will be simpler ways to just modify the file. However, if you want to process, analyze the data - using pandas will make your life easier.

Upvotes: 0

Craig Burgler
Craig Burgler

Reputation: 1779

The read() method in clean = input.read().replace(x,y) is reading the entire file at once, so it makes sense that only one replacement is made. Try readline() or the preferred for line in file:to process the file line by line.

Upvotes: 0

Related Questions