bcsteeve
bcsteeve

Reputation: 1001

Regex for converting spaces to tabs but leaving word items in the middle alone?

I have a problem that my Googling tells me can be solved with Regex, but I'm completely unfamiliar and I tried following some tutorials but I'm entirely lost. I have this sample data set:

59 65 21366 CLEMENTINES 4.89 2.00 9.78
59 61 22384 PORK BACK RIBS 6.50 2.40 15.59
59 65 30669 BANANAS 1.89 1.00 1.89
59 13 391314 KODIAK POWER CAKES 14.69 1.00 14.69
59 65 392373 BAJA CHOPPED SALAD KIT 2.99 1.00 2.99
59 39 429227 FILA MENS ANKLE SOCK 6PK 9.99 1.00 9.99
59 65 1056187 ASIAN CASHEW SALAD KIT 2.99 1.00 2.99
59 28 1159696 SHOPKINS GG/TWOZIES ASST 5.97 1.00 5.97
59 13 1221327 KODIAK POWER CAKES -3.00 -3.00 COUPON
59 14 1270070 KLEENEX ULTRA SOFT 12 PCK 16.49 1.00 16.49
59 21 5221111 10 DRAWER STORAGE CART 29.99 1.00 29.99
59 17 1019 HALF + HALF 1 L 1.99 1.00 1.99

I want to import it into a spreadsheet. Visually I can see what I want (3 numeric columns at the beginning, then a description that may or may not contain spaces, then usually 3 numeric columns, but sometimes 2 + a word (see the line that ends in "coupon").

But because of the spaces and lack of quotes, my Excel skills (which are also marginal) don't allow me to import this in a sensible way.

I thought of doing multiple processes: pull off the 3 columns at the left and then 3 columns at the right... but in Excel I see no way to operate "from the right".

Any help appreciated. Thanks.

[edit] I realize from the comments that my ignorance has resulted in a poor question.

I didn't realize "Regex" was specific to language, etc. I am trying to import a csv into Excel, but I was using Notepad++ to perform the regex operations. I don't know what "flavor" that uses but the answer below helped greatly.

Upvotes: 1

Views: 35

Answers (1)

slim
slim

Reputation: 41223

You can match this with:

^(\S*) (\S*) (\S*) (.*) (\S*) (\S*) (\S*)$
  • ^ matches the start of a line
  • \S* matches one or more non-whitespace characters
  • .* matches anything, including spaces
  • the parentheses capture the matches into capture groups
  • $ matches the end of a line.

You haven't said what tool you intend to use to do this.

One way is with a Perl one-liner:

perl -pe 's/^(\S*) (\S*) (\S*) (.*) (\S*) (\S*) (\S*)$/"\1","\2","\3","\4","\5","\6","\7"/' input.txt

Returning:

"59","65","21366","CLEMENTINES","4.89","2.00","9.78"
...
"59","13","1221327","KODIAK POWER CAKES","-3.00","-3.00","COUPON"
... etc.

Upvotes: 1

Related Questions