Reputation: 85
I am working with csv files, they seismic catalogs from a database, I need to arrange them like USGS format in order to start another steps.
My input data format is:
DatesT,Latitude,Longitude,Magnitude,Depth,Catalog
1909,7,23,170000,-17.430,-66.349,5.1,0,PRE-GEM-ISC
1913,12,14,024500,-17.780,-63.170,5.6,0,PRE-GEM-ISC
The USGS input format is
DatesT,Latitude,Longitude,Magnitude,Depth,Catalog
1909-7-23T17:00:00,-17.430,-66.349,5.1,0,PRE-GEM-ISC
1913-12-14T02:45:00,-17.780,-63.170,5.6,0,PRE-GEM-ISC
To "convert" my input to USGS format I did the following steps:
archi='catalog.txt'
sed 's/,/-/1' $archi > temp1.dat # to change "," to "-"
sed 's/,/-/1' temp1.dat > temp2.dat # same as above
sed 's/,/T/1' temp2.dat > temp3.dat # To add T between date and time
sed -i.bak "1 s/^.*$/DatesT,Latitude,Longitude,Magnitude,Depth,Catalog/" temp3.dat #to preserve the header.
I have the following output:
DatesT,Latitude,Longitude,Magnitude,Depth,Catalog
1909-7-23T170000,-17.430,-66.349,5.1,0,PRE-GEM-ISC
1913-12-14T024500,-17.780,-63.170,5.6,0,PRE-GEM-ISC
I tried to implement the following command:
sed 's/.\{13\}/&: /g' temp3.dat > temp4.dat
Unfortunately it did not work as I thought because it did not have the same place for all lines.
Do you have any idea to improve my code?
Upvotes: 1
Views: 43
Reputation: 58371
This might work for you (GNU sed):
sed -E '1!s/^([^,]*),([^,]*),([^,]*),(..)(..)/\1-\2-\3T\4:\5:/' file
Forget about the header.
Replace the first and second fields delimiters (all fields are delimited by a comma ,
) with a dash -
.
Replace the third fields delimiter by T
.
Split the fourth field into three equal parts and separate each part by a colon :
.
N.B. The last part of the fourth field will stay as is and so does not need to be defined.
Sometimes as programmers we become too focused on data and would be better served by looking at the problem as an artist and coding what we see.
Upvotes: 2
Reputation: 16974
One way using GNU sed:
sed -r 's/([0-9]{4}),([0-9]{1,2}),([0-9]{1,2}),([0-9]{2})([0-9]{2})([0-9]{2})(,.*)/\1-\2-\3T\4:\5:\6\7/' file
You split the file into individual tokens,meaning column as token one, 2nd column as token 2, and when it comes to 4th column, take 2 numbers as a token, and then substitute it as required.
Upvotes: 3
Reputation: 6561
You can do:
cat initialfile.csv|perl -p -e "s/^(\d{4}),(\d+),(\d+),(\d{2})(\d{2})(\d{2}),([0-9.-]+),([0-9.-]+),(.*)$/\1-\2-\3T\4:\5:\6,\7,\8,\9/g"
or for inline edit:
perl -p -i -e "s/^(\d{4}),(\d+),(\d+),(\d{2})(\d{2})(\d{2}),([0-9.-]+),([0-9.-]+),(.*)$/\1-\2-\3T\4:\5:\6,\7,\8,\9/g" initialfile.csv
which should output USGS format
Upvotes: 2