Azam
Azam

Reputation: 147

Search and split certain string in text file and saved output

How to split certain string in rows that contain characters of numbers and alphabets.

Set data I have is like this (tembin-data.dat):

['3317121918', '69N1345E', '15']

['3317122000', '72N1337E', '20']

['3317122006', '75N1330E', '20']

['3317122012', '78N1321E', '20']

['3317122018', '83N1310E', '25']

.......etc

I need the new data arrangement by removing "N" and "E" just like this:

['3317121918', '69','1345','15']

['3317122000', '72','1337','20']

['3317122006', '75','1330','20']

['3317122012', '78','1321','20']

['3317122018', '83','1310','25']

.......etc

Python script that I used at moment is like this:

newfile = open('tembin-data.dat', 'w')
with open('tembin4.dat', 'r') as inF:
     for line in inF:
         myString = '331712'
         if myString in line:
             data=line.split()
             print data
             newfile.write("%s\n" % data)
newfile.close() 

tembin4.dat is as below:

REMARKS:

230900Z POSITION NEAR 7.8N 118.6E.

TROPICAL STORM 33W (TEMBIN), LOCATED APPROXIMATELY 769 NM EAST-

SOUTHEAST OF HO CHI MINH CITY, VIETNAM, HAS TRACKED WESTWARD AT

11 KNOTS OVER THE PAST SIX HOURS. MAXIMUM SIGNIFICANT WAVE HEIGHT

AT 230600Z IS 14 FEET. NEXT WARNINGS AT 231500Z, 232100Z, 240300Z

AND 240900Z.//

3317121918  69N1345E  15

3317122000  72N1337E  20

3317122006  75N1330E  20

3317122012  78N1321E  20

3317122018  83N1310E  25

3317122100  86N1295E  35

3317122106  85N1284E  35

3317122112  84N1276E  40

3317122118  79N1267E  50

3317122118  79N1267E  50

3317122200  78N1256E  45

3317122206  78N1236E  45

3317122212  80N1225E  45

3317122218  79N1214E  50

3317122218  79N1214E  50

3317122300  77N1204E  55

3317122300  77N1204E  55

3317122306  77N1193E  55

3317122306  77N1193E  55

NNNN

Upvotes: 0

Views: 82

Answers (5)

Ajax1234
Ajax1234

Reputation: 71451

You can try this short solution in Python3:

import re
s = [['3317121918', '69N1345E', '15'], ['3317122000', '72N1337E', '20'], ['3317122006', '75N1330E', '20'], ['3317122012', '78N1321E', '20'],
['3317122018', '83N1310E', '25']]
new_s = [[a, *re.findall('\d+', b), c] for a, b, c in s]

Output:

[['3317121918', '69', '1345', '15'], ['3317122000', '72', '1337', '20'], ['3317122006', '75', '1330', '20'], ['3317122012', '78', '1321', '20'], ['3317122018', '83', '1310', '25']]

Upvotes: 1

Aaditya Ura
Aaditya Ura

Reputation: 12669

You can use Positive Lookbehind (?<=N) and Positive Lookahead(?=N) and just capture the group :

import re
pattern="'\d+'|(\d+)(?=N)|(?<=N)(\d+)"
with open('file.txt','r') as f:
    for line in f:
        sub_list=[]
        search=re.finditer(pattern,line)
        for lin in search:
            sub_list.append(int(lin.group().strip("'")))

        if sub_list:
            print(sub_list)

output:

[3317121918, 69, 1345, 15]
[3317122000, 72, 1337, 20]
[3317122006, 75, 1330, 20]
[3317122012, 78, 1321, 20]
[3317122018, 83, 1310, 25]

Regex information :

'\d+'|(\d+)(?=N)|(?<=N)(\d+)/g'

\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed 

Positive Lookahead (?=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)

Positive Lookbehind (?<=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)

Upvotes: 2

Vivek Harikrishnan
Vivek Harikrishnan

Reputation: 866

Just extending your approach with regex and split.

import re
newfile = open('tembin-data.dat', 'w')

pat = re.compile("[N|E]")

with open('tembin4.dat', 'r') as inF:
 for line in inF:
     myString = '331712'
     if myString in line:
         data=line.split()
         data[2:2] = pat.split(data[1])[:-1] # insert the list flattend at index 2
         del data[1] # Remove string with N&E from list.
         print data
         newfile.write("%s\n" % data)

Upvotes: 2

Sahil Dahiya
Sahil Dahiya

Reputation: 721

Try this:

import re
for line in open(r"tembin4.txt","r"):
    lst = line.split(" ")
    for i,x in enumerate(lst):
        grp = re.findall('(\d+)N(\d+)E',x)
        if len(grp) !=0:
            lst.remove(x)
            lst.insert(i,grp[0][1])
            lst.insert(i,grp[0][0])
    print(" ".join(lst))

Upvotes: 2

Bhavesh Ghodasara
Bhavesh Ghodasara

Reputation: 2071

Using pandas you can do this easily.

import pandas as pd
import os # optional

os.chdir('C:\\Users') # optional
df = pd.read_csv('tembin-data.dat', header = None)

df[3]= df[1].str.slice(1,3)
df[4]= df[1].str.slice(4,8)

df = df.drop(1, axis = 1)

df.to_csv('tembin-out.dat',header=False)

Upvotes: 1

Related Questions