jxn
jxn

Reputation: 8025

awk edit only 1 column with regex

I have a CSV file with 3 columns:

id,text,date
123,hi 你好吗?,2016-01-01
246,this is stackoverflow 我需要帮忙,2016-02-01

I want to only edit column 2 where i remove only english characters and keep the chinese ones. The other columns remain untouched.

Output i want:

id,text,date
123,你好吗?,2016-01-01
246,我需要帮忙,2016-02-01

Is there a better way to do it than this:

cat myfile.csv|cut -d, -f2|sed 's/[a-zA-Z]*//g' > tmp.csv
paste -d, myfile.csv tmp.csv|awk -F, '{OFS=",";print $1,$7,$3}' >tmp2.csv

Upvotes: 4

Views: 197

Answers (4)

Hackaholic
Hackaholic

Reputation: 19733

awk 'NR==1{print;}NR>1{gsub(/[a-zA-Z ]+/,"");print;}' your_file
id,text,date
123,你好吗?,2016-01-01
246,我需要帮忙,2016-02-01

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203229

If the script you posted at the bottom of your question works for you then so will this:

awk 'BEGIN{FS=OFS=","} NR>1{gsub(/[a-zA-Z]/,"",$2)} 1' file

You said "characters" though, not "letters", so YMMV.

Upvotes: 2

bian
bian

Reputation: 1456

awk -F, '{ s=split($2,t," "); sub($2, t[s]); print }' file
id,text,date
123,你好吗?,2016-01-01
246,我需要帮忙,2016-02-01

Upvotes: 0

Fabricator
Fabricator

Reputation: 12772

awk -F, 'BEGIN {OFS=","} { if (NR>1) {gsub(/[\x00-\x7F]/, "", $2)}; print }' test.txt
  • NR>1: don't operate on first line
  • gsub(/[\x00-\x7F]/, "", $2): get rid of ascii characters in column 2. doc

Upvotes: 3

Related Questions