Naz
Naz

Reputation: 189

text processing csv file, truncate digits without rounding

I have a csv file with 1000 rows and 75 cols. The columns have different datatypes. I would like to access some cols (ex cols 46, 47 and truncate to 6 digits after decimal without rounding) and some col (ex col 6 truncate to 2 digits after decimal). Also, save the old file along with new changes as .csv file.

Here is some dummy data (with 1 row, 6 cols). The first row consists of col names.

col1,col2,col3,col4,col5,col6

id_1,41.76149291933553,-70.06372272425352,United States, "[, ]",0.1333534322353434

Problem - Some columns contain the string "[, ]". I know using awk, we can access columns such as $46, $47. But the field separator (comma) is also part of the string above so I am not sure how I can get proper columns

Upvotes: 0

Views: 247

Answers (1)

aborruso
aborruso

Reputation: 5698

starting from

col1,col2,col3,col4,col5,col6
id_1,41.76149291933553,-70.06372272425352,United States,"[, ]",0.1333534322353434

with Miller (http://johnkerl.org/miller/doc/index.html) you can run

mlr --csv --fs "," cut -f col3,col4,col6 \
then put -S '$col3=gsub($col3,"(\.)([0-9]{3})([0-9]*)","\1\2");$col6=gsub($col6,"(\.)([0-9]{5})([0-9]*)","\1\2")' input

to have

col3,col4,col6
-70.063,United States,0.13335

Using cut you extract only col3,col4,col6 columns.

And using $col3=gsub($col3,"(\.)([0-9]{3})([0-9]*)","\1\2") in example you truncate $col3 to 3 numbers. Instead col6 is truncated to 5.

Upvotes: 1

Related Questions