HattrickNZ
HattrickNZ

Reputation: 4643

I just want the last 3 characters of a column returned to the original file

first 2lines of my data:

"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","123427","456060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"

I only want the last 3 characters of column 2 and column 3, I dont want the column header affected. happy for a solution that can do column2 first and then do column 3

I am fiddling with sed and awk at the minute but have no joy yet.

this is what I want:

"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"

edit1 this gives me the last 3 digits(+ "), just need to write this back to the orig file?

$ awk -F"," 'NR>1{ print $2}' head_test_real.csv | sed 's/.*\(....\)/\1/'
427"
592"
007"
592"
409"
742"
387"
731"
556"

edit2 this works but i lose the double quotes "123427" goes to 427, i ould like to keep the double quotes.
* NR>1 works on the rows after the 1st row.

$ awk -F, 'NR>1{$2=substr($2,length($2)-3,3)}1' OFS=, head_test_real.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06",427,"456060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"

edit3 @Mark tks fro correct answer, and here just for my ref on the quotes.

$ ####csv.QUOTE_ALL

$ cat out.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"



$ ####csv.QUOTE_MINIMAL

$ cat out.csv
Rec_Open_Date,MSISDN,IMEI,Data_Volume_Bytes,Device_Manufacturer,Device_Model,Product_Description
2015-10-06,427,060,137765,Samsung Korea,Samsung SM-G900I,$39 Plan

$ ###csv.QUOTE_NONNUMERIC

$ cat out.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"



$ ###csv.QUOTE_NONE

$ cat out.csv
Rec_Open_Date,MSISDN,IMEI,Data_Volume_Bytes,Device_Manufacturer,Device_Model,Product_Description
2015-10-06,427,060,137765,Samsung Korea,Samsung SM-G900I,$39 Plan

Upvotes: 0

Views: 66

Answers (3)

choroba
choroba

Reputation: 241808

Perl to the rescue!

perl -pe 's/",".*?(...",")/","$1/ if $. > 1' < input > output
  • -p reads the input line by line and prints the result
  • s/regex/replacement/ is a substitution
  • .*? matches anything (like .*), but the question mark makes it "frugal", i.e. it matches the shortest string possible
  • (...",") creates a capture group starting three characters before ",", it can be referenced as $1.
  • $. is the line number, no replacement happens on line 1.

Make sure the first two columns are always quoted and the second column is never shorter than 3 characters.

To modify the third column, you can modify the regex to

perl -pe 's/^("(?:.*?","){2}).*?(...",")/$1$2/ if $. > 1'
#                         ~

Modify the indicated number to handle any column you like.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203229

$ awk 'BEGIN{FS=OFS="\",\""} NR>1{for (i=2;i<=3;i++) $i=substr($i,length($i)-2)} 1' file
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"

As with any command, to write back to the original file is just:

command file > tmp && mv tmp file

Upvotes: 1

Mark Reed
Mark Reed

Reputation: 95242

While awk seems like a natural fit for comma-separated data, it doesn't deal well with the quoted-fields version. I would recommend using a dedicated CSV-processing library like the one that ships with Python (both 2 and 3):

import csv
with open('in.csv','r') as infile:
  reader = csv.reader(infile)
  with open('out.csv','w') as outfile:
    writer = csv.writer(outfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_ALL)
    writer.writerow(next(reader)) 

    for row in reader:
      row[1] = row[1][-3:]
      row[2] = row[2][-3:]
      writer.writerow(row)

Put the above code into a file named e.g. fixcsv.py and make the filenames match what you have and want, then just run it with python fixcsv.py (or python3 fixcsv.py).

I set it to quote everything in the output (QUOTE_ALL); if you don't want it to do that, you can set it to QUOTE_MINIMAL, QUOTE_NONNUMERIC or QUOTE_NONE.

The row assignments replace the second and third fields (row[1] and row[2], since the first field is row[0]) with their last three characters ([-3:]). You could also do it arithmetically with e.g. row[1] = int(row[1]) % 1000.

Upvotes: 2

Related Questions