reader
reader

Reputation: 31

Unix-Splitting 1 column into two columns

I have dataset as follows:

item  gram
tomato 500
orange 1500
bread  2000

Now they seem to be separated as shown (for example, tomato 500, not tomato500), but they are all in one column when I make a csv file. I want items go into one column and gram goes into another column to be:

item       gram
tomato     500
orange     1500
bread      2000

Upvotes: 1

Views: 1602

Answers (1)

tripleee
tripleee

Reputation: 189830

To change runs of spaces to a tab,

tr -s ' ' $'\t' <old.txt >new.tsv

where old.txt is your input file and new.tsv contains the new output from tr.

Your example shows multiple spaces between some values. If you always have exactly one space between values, or want to change n spaces to n tabs, take out the -s (though of course in the former case, it doesn't really matter).

$'\t' is a (Bash-only) quoting mechanism which allows you to pass a literal tab as an argument. tr does not have an entirely portable syntax for this, though you could probably use '\t' on most modern systems as well.

The second argument to tr specifies what character to use as the new separator. I chose tab as separator because that's what Excel expected for what it (somewhat confusedly) calls "Unix" files last time I looked. To use commas (for proper CSV) or semicolon (which is apparently preferred over comma in some locales, because they use comma as the decimal separator in numbers; even though Excel still calls the format CSV) use ',' or ';' as the second argument, respectively.

(The single quotes are important to prevent the shell from performing any interpretation. For example, the semicolon character is a command separator in the shell, so you would get a rather bewildering error message if you forget to quote it.)

tr uncoditionally replaces all instances of the characters you specify. If you want more detailed control (conditional or context-dependent replacement) you need a more complex tool, like sed.

The details of the CSV format are actually rather complex, and there are several dialects, even between different versions of Excel, so you might need to experiment a bit if your real data is more complex than your example.

Upvotes: 1

Related Questions