Bert Colemont
Bert Colemont

Reputation: 59

How to merge first 12 columns?

I have a text file which contains text like:

Somename of someone                                   1234 7894
Even some more name                                   2345 5343
Even more of the same                                 6572 6456
I am a customer                                       1324 7894
I am another customer                                 5612 3657
Also I am a customer and I am number Three            9631 7411
And I am number four and not the latest one in list   8529 9369
And here I am                                         4567 9876

I need to make a CSV file from this, but the problem is that the name contains 12 columns, so I need to merge all from the first 12 columns to 1 column so the CSV file will look like:

Somename of someone,123456,789456
cut -d ' ' -f1-11  test | sed "s/[[:space:]]/\\ /g" | sed "s/\t/\\ /g" > test1

gives me a file with the first 12 columns.

Upvotes: 1

Views: 105

Answers (5)

karakfa
karakfa

Reputation: 67507

a hacky awk

$ awk '{last="," $(NF-1) "," $NF; NF-=2; print $0 last}' file 

Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876

capture the last two columns, reduce column count which will also normalize spacing; then print.

Upvotes: 0

Alex Harvey
Alex Harvey

Reputation: 15482

If you don't mind using GNU AWK instead, you can do this:

gawk 'BEGIN {FIELDWIDTHS = "54 5 5"; OFS = ","} {print $1, $2, $3}' FILE

Further explanation:

  • You really have 3 columns of fixed width data thus FIELDWIDTHS = "54 5 5"
  • You want the output field separator to be a comma thus OFS = ","

Note that FIELDWIDTHS is a feature of GNU AWK.

If you don't mind the whitespace being preserved in our CSV, then you're done.

Or, if you also need the white space removed, then:

# test.gawk

BEGIN {
  FIELDWIDTHS = "54 5 5"
  OFS = ","
}
{
  for (f=1; f<=NF; f++) {
    sub(/ +$/, "", $f)   # Delete whitespace.
  }
  print
}

Testing:

▶ gawk -f test.gawk FILE
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876

(Note that in the second version, as suggested by Ed Morton in comments, I was able to just use print at the end, because we modified the fields which effectively updates $0 and the field separators are replaced by OFS.)

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203712

With GNU sed for \s/\S shorthand for space/non-space and -E to enable EREs:

$ sed -E 's/\s+(\S+)\s+(\S+)$/,\1,\2/' file
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876

and the functional equivalent with any POSIX sed:

$ sed 's/[[:space:]]*\([^[:space:]]\{1,\}\)[[:space:]]*\([^[:space:]]\{1,\}\)$/,\1,\2/' file
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876

or with any awk:

$ awk -v OFS=',' '{x=$(NF-1) OFS $NF; sub(/([[:space:]]+[^[:space:]]+){2}$/,""); print $0, x}' file
Somename of someone,1234,7894
Even some more name,2345,5343
Even more of the same,6572,6456
I am a customer,1324,7894
I am another customer,5612,3657
Also I am a customer and I am number Three,9631,7411
And I am number four and not the latest one in list,8529,9369
And here I am,4567,9876

Upvotes: 2

Qeole
Qeole

Reputation: 9144

If the different columns related to the name are part of the same CSV column, and should therefore be left untouched, why not working on the last two columns only?

$ sed 's/\t* *\([0-9]\+\)\t* *\([0-9]\+\)$/,\1,\2/' input_file
Somename of someone,123456,789456
Even some more name,234567,534312
Even more of the same,657212,645613

Upvotes: 1

jklmnop
jklmnop

Reputation: 86

Can you explain are the first columns a name and how many name columns are their maximum, ahead of the numerals? Knowing that can make it relatively easy to add commas and save to a .csv.

tr -s '[:blank:]' ','

Upvotes: 0

Related Questions