tonino
tonino

Reputation: 85

add characters each two places within sed

I am working with csv files, they seismic catalogs from a database, I need to arrange them like USGS format in order to start another steps.

My input data format is:

DatesT,Latitude,Longitude,Magnitude,Depth,Catalog
1909,7,23,170000,-17.430,-66.349,5.1,0,PRE-GEM-ISC
1913,12,14,024500,-17.780,-63.170,5.6,0,PRE-GEM-ISC

The USGS input format is

DatesT,Latitude,Longitude,Magnitude,Depth,Catalog
1909-7-23T17:00:00,-17.430,-66.349,5.1,0,PRE-GEM-ISC
1913-12-14T02:45:00,-17.780,-63.170,5.6,0,PRE-GEM-ISC

To "convert" my input to USGS format I did the following steps:

archi='catalog.txt'
sed 's/,/-/1' $archi > temp1.dat   # to change "," to "-" 
sed 's/,/-/1' temp1.dat > temp2.dat  # same as above
sed 's/,/T/1' temp2.dat > temp3.dat   # To add T between date and time
sed -i.bak "1 s/^.*$/DatesT,Latitude,Longitude,Magnitude,Depth,Catalog/" temp3.dat  #to preserve the header.

I have the following output:

DatesT,Latitude,Longitude,Magnitude,Depth,Catalog
1909-7-23T170000,-17.430,-66.349,5.1,0,PRE-GEM-ISC
1913-12-14T024500,-17.780,-63.170,5.6,0,PRE-GEM-ISC

I tried to implement the following command:

sed 's/.\{13\}/&: /g' temp3.dat > temp4.dat

Unfortunately it did not work as I thought because it did not have the same place for all lines.

Do you have any idea to improve my code?

Upvotes: 1

Views: 43

Answers (3)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed -E  '1!s/^([^,]*),([^,]*),([^,]*),(..)(..)/\1-\2-\3T\4:\5:/' file

Forget about the header.

Replace the first and second fields delimiters (all fields are delimited by a comma ,) with a dash -.

Replace the third fields delimiter by T.

Split the fourth field into three equal parts and separate each part by a colon :.

N.B. The last part of the fourth field will stay as is and so does not need to be defined.

Sometimes as programmers we become too focused on data and would be better served by looking at the problem as an artist and coding what we see.

Upvotes: 2

Guru
Guru

Reputation: 16974

One way using GNU sed:

sed -r 's/([0-9]{4}),([0-9]{1,2}),([0-9]{1,2}),([0-9]{2})([0-9]{2})([0-9]{2})(,.*)/\1-\2-\3T\4:\5:\6\7/'  file

You split the file into individual tokens,meaning column as token one, 2nd column as token 2, and when it comes to 4th column, take 2 numbers as a token, and then substitute it as required.

Upvotes: 3

Ron
Ron

Reputation: 6561

You can do:

cat initialfile.csv|perl -p -e "s/^(\d{4}),(\d+),(\d+),(\d{2})(\d{2})(\d{2}),([0-9.-]+),([0-9.-]+),(.*)$/\1-\2-\3T\4:\5:\6,\7,\8,\9/g"

or for inline edit:

perl -p -i -e "s/^(\d{4}),(\d+),(\d+),(\d{2})(\d{2})(\d{2}),([0-9.-]+),([0-9.-]+),(.*)$/\1-\2-\3T\4:\5:\6,\7,\8,\9/g" initialfile.csv

which should output USGS format

Upvotes: 2

Related Questions