Reputation: 147
How to split certain string in rows that contain characters of numbers and alphabets.
Set data I have is like this (tembin-data.dat
):
['3317121918', '69N1345E', '15']
['3317122000', '72N1337E', '20']
['3317122006', '75N1330E', '20']
['3317122012', '78N1321E', '20']
['3317122018', '83N1310E', '25']
.......etc
I need the new data arrangement by removing "N"
and "E"
just like this:
['3317121918', '69','1345','15']
['3317122000', '72','1337','20']
['3317122006', '75','1330','20']
['3317122012', '78','1321','20']
['3317122018', '83','1310','25']
.......etc
Python script that I used at moment is like this:
newfile = open('tembin-data.dat', 'w')
with open('tembin4.dat', 'r') as inF:
for line in inF:
myString = '331712'
if myString in line:
data=line.split()
print data
newfile.write("%s\n" % data)
newfile.close()
tembin4.dat
is as below:
REMARKS:
230900Z POSITION NEAR 7.8N 118.6E.
TROPICAL STORM 33W (TEMBIN), LOCATED APPROXIMATELY 769 NM EAST-
SOUTHEAST OF HO CHI MINH CITY, VIETNAM, HAS TRACKED WESTWARD AT
11 KNOTS OVER THE PAST SIX HOURS. MAXIMUM SIGNIFICANT WAVE HEIGHT
AT 230600Z IS 14 FEET. NEXT WARNINGS AT 231500Z, 232100Z, 240300Z
AND 240900Z.//
3317121918 69N1345E 15
3317122000 72N1337E 20
3317122006 75N1330E 20
3317122012 78N1321E 20
3317122018 83N1310E 25
3317122100 86N1295E 35
3317122106 85N1284E 35
3317122112 84N1276E 40
3317122118 79N1267E 50
3317122118 79N1267E 50
3317122200 78N1256E 45
3317122206 78N1236E 45
3317122212 80N1225E 45
3317122218 79N1214E 50
3317122218 79N1214E 50
3317122300 77N1204E 55
3317122300 77N1204E 55
3317122306 77N1193E 55
3317122306 77N1193E 55
NNNN
Upvotes: 0
Views: 82
Reputation: 71451
You can try this short solution in Python3:
import re
s = [['3317121918', '69N1345E', '15'], ['3317122000', '72N1337E', '20'], ['3317122006', '75N1330E', '20'], ['3317122012', '78N1321E', '20'],
['3317122018', '83N1310E', '25']]
new_s = [[a, *re.findall('\d+', b), c] for a, b, c in s]
Output:
[['3317121918', '69', '1345', '15'], ['3317122000', '72', '1337', '20'], ['3317122006', '75', '1330', '20'], ['3317122012', '78', '1321', '20'], ['3317122018', '83', '1310', '25']]
Upvotes: 1
Reputation: 12669
You can use
Positive Lookbehind (?<=N)
andPositive Lookahead(?=N)
and just capture the group :
import re
pattern="'\d+'|(\d+)(?=N)|(?<=N)(\d+)"
with open('file.txt','r') as f:
for line in f:
sub_list=[]
search=re.finditer(pattern,line)
for lin in search:
sub_list.append(int(lin.group().strip("'")))
if sub_list:
print(sub_list)
output:
[3317121918, 69, 1345, 15]
[3317122000, 72, 1337, 20]
[3317122006, 75, 1330, 20]
[3317122012, 78, 1321, 20]
[3317122018, 83, 1310, 25]
Regex information :
'\d+'|(\d+)(?=N)|(?<=N)(\d+)/g'
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed
Positive Lookahead (?=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)
Positive Lookbehind (?<=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)
Upvotes: 2
Reputation: 866
Just extending your approach with regex and split.
import re
newfile = open('tembin-data.dat', 'w')
pat = re.compile("[N|E]")
with open('tembin4.dat', 'r') as inF:
for line in inF:
myString = '331712'
if myString in line:
data=line.split()
data[2:2] = pat.split(data[1])[:-1] # insert the list flattend at index 2
del data[1] # Remove string with N&E from list.
print data
newfile.write("%s\n" % data)
Upvotes: 2
Reputation: 721
Try this:
import re
for line in open(r"tembin4.txt","r"):
lst = line.split(" ")
for i,x in enumerate(lst):
grp = re.findall('(\d+)N(\d+)E',x)
if len(grp) !=0:
lst.remove(x)
lst.insert(i,grp[0][1])
lst.insert(i,grp[0][0])
print(" ".join(lst))
Upvotes: 2
Reputation: 2071
Using pandas you can do this easily.
import pandas as pd
import os # optional
os.chdir('C:\\Users') # optional
df = pd.read_csv('tembin-data.dat', header = None)
df[3]= df[1].str.slice(1,3)
df[4]= df[1].str.slice(4,8)
df = df.drop(1, axis = 1)
df.to_csv('tembin-out.dat',header=False)
Upvotes: 1