Reputation: 189
I have a csv file with 1000 rows and 75 cols. The columns have different datatypes. I would like to access some cols (ex cols 46, 47 and truncate to 6 digits after decimal without rounding) and some col (ex col 6 truncate to 2 digits after decimal). Also, save the old file along with new changes as .csv file.
Here is some dummy data (with 1 row, 6 cols). The first row consists of col names.
col1,col2,col3,col4,col5,col6
id_1,41.76149291933553,-70.06372272425352,United States, "[, ]",0.1333534322353434
Problem - Some columns contain the string "[, ]". I know using awk, we can access columns such as $46, $47. But the field separator (comma) is also part of the string above so I am not sure how I can get proper columns
Upvotes: 0
Views: 247
Reputation: 5698
starting from
col1,col2,col3,col4,col5,col6
id_1,41.76149291933553,-70.06372272425352,United States,"[, ]",0.1333534322353434
with Miller (http://johnkerl.org/miller/doc/index.html) you can run
mlr --csv --fs "," cut -f col3,col4,col6 \
then put -S '$col3=gsub($col3,"(\.)([0-9]{3})([0-9]*)","\1\2");$col6=gsub($col6,"(\.)([0-9]{5})([0-9]*)","\1\2")' input
to have
col3,col4,col6
-70.063,United States,0.13335
Using cut
you extract only col3,col4,col6 columns.
And using $col3=gsub($col3,"(\.)([0-9]{3})([0-9]*)","\1\2")
in example you truncate $col3
to 3 numbers. Instead col6 is truncated to 5.
Upvotes: 1