Reputation: 4643
first 2lines of my data:
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","123427","456060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
I only want the last 3 characters of column 2 and column 3, I dont want the column header affected. happy for a solution that can do column2 first and then do column 3
I am fiddling with sed and awk at the minute but have no joy yet.
this is what I want:
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
edit1 this gives me the last 3 digits(+ "), just need to write this back to the orig file?
$ awk -F"," 'NR>1{ print $2}' head_test_real.csv | sed 's/.*\(....\)/\1/'
427"
592"
007"
592"
409"
742"
387"
731"
556"
edit2 this works but i lose the double quotes "123427" goes to 427, i ould like to keep the double quotes.
* NR>1 works on the rows after the 1st row.
$ awk -F, 'NR>1{$2=substr($2,length($2)-3,3)}1' OFS=, head_test_real.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06",427,"456060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
edit3 @Mark tks fro correct answer, and here just for my ref on the quotes.
$ ####csv.QUOTE_ALL
$ cat out.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
$ ####csv.QUOTE_MINIMAL
$ cat out.csv
Rec_Open_Date,MSISDN,IMEI,Data_Volume_Bytes,Device_Manufacturer,Device_Model,Product_Description
2015-10-06,427,060,137765,Samsung Korea,Samsung SM-G900I,$39 Plan
$ ###csv.QUOTE_NONNUMERIC
$ cat out.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
$ ###csv.QUOTE_NONE
$ cat out.csv
Rec_Open_Date,MSISDN,IMEI,Data_Volume_Bytes,Device_Manufacturer,Device_Model,Product_Description
2015-10-06,427,060,137765,Samsung Korea,Samsung SM-G900I,$39 Plan
Upvotes: 0
Views: 66
Reputation: 241808
Perl to the rescue!
perl -pe 's/",".*?(...",")/","$1/ if $. > 1' < input > output
-p
reads the input line by line and prints the results/regex/replacement/
is a substitution.*?
matches anything (like .*
), but the question mark makes it "frugal", i.e. it matches the shortest string possible(...",")
creates a capture group starting three characters before ","
, it can be referenced as $1
.$.
is the line number, no replacement happens on line 1.Make sure the first two columns are always quoted and the second column is never shorter than 3 characters.
To modify the third column, you can modify the regex to
perl -pe 's/^("(?:.*?","){2}).*?(...",")/$1$2/ if $. > 1'
# ~
Modify the indicated number to handle any column you like.
Upvotes: 1
Reputation: 203229
$ awk 'BEGIN{FS=OFS="\",\""} NR>1{for (i=2;i<=3;i++) $i=substr($i,length($i)-2)} 1' file
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
As with any command, to write back to the original file is just:
command file > tmp && mv tmp file
Upvotes: 1
Reputation: 95242
While awk
seems like a natural fit for comma-separated data, it doesn't deal well with the quoted-fields version. I would recommend using a dedicated CSV-processing library like the one that ships with Python (both 2 and 3):
import csv
with open('in.csv','r') as infile:
reader = csv.reader(infile)
with open('out.csv','w') as outfile:
writer = csv.writer(outfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_ALL)
writer.writerow(next(reader))
for row in reader:
row[1] = row[1][-3:]
row[2] = row[2][-3:]
writer.writerow(row)
Put the above code into a file named e.g. fixcsv.py
and make the filenames match what you have and want, then just run it with python fixcsv.py
(or python3 fixcsv.py
).
I set it to quote everything in the output (QUOTE_ALL
); if you don't want it to do that, you can set it to QUOTE_MINIMAL
, QUOTE_NONNUMERIC
or QUOTE_NONE
.
The row
assignments replace the second and third fields (row[1]
and row[2]
, since the first field is row[0]
) with their last three characters ([-3:]
). You could also do it arithmetically with e.g. row[1] = int(row[1]) % 1000
.
Upvotes: 2