n1k31t4
n1k31t4

Reputation: 2874

How to use the unix/shell paste command for several files

I have five csv files, which I would like to paste together using the shell function. This basically performs a concatenation of the rows in several text files. What I am after is seen in example 8 in this tutorial

I am doing this from Python via subprocess.call(), however doing it directly in terminal produces the same confusing results.

My files are all tab delimited (which is the default delimiter of the paste function)

When I use the function on 2, 3, ... n files, it seems as though the headers of the second to n'th files are added as a second row, with the header of only the first file apearing in the first row.

Here is my command:

paste outfile.txt tmp_1.txt tmp_2.txt tmp_3.txt tmp_4 > final.txt

Here is the output:

col1    col2    col3               # <-- 1st file has 3 columns
col4    col5                       # <-- 2nd file has 2 columns
col6                               # <-- 3rd file has 1 columns
col7                               # <-- 4th file has 1 columns
col8    col9                       # <-- 5th file has 2 columns

After this, however, the rows carry on in a different fashion (consistently to the end of the files):

col1    col2    col3
col4    col5    col6    col6    col7    col8    col9
col1    col2    col3
col4    col5    col6    col6    col7    col8    col9

[Those two code blocks follow on from each other]

I can't find any more options I could specify in this documentation, explicitly entering -d'\t' doesn't change anything. I have also tried fewer or more files, changing the oder of the files (in case my first one has some carrriage returns etc. in it, but the results are always the same.

Update #1

Here is a piece of the output from the command recommended by @shellter in the comments: cat -vet file1.txt file2.txt ... file5.txt | less :

Col1^ICol2^ICol3^M$
Some text was here^I2^I-3^M$
Some text was here^I2^I-1^M$
Some text was here^I2^I-2^M$
Some text was here^I2^I-1^M$

You can see the ^I markers for the tabs and the ^M plus $ for end-of-line / carriage-return / newline.

update #2

Having applied the shell function dos2unix to my files:

dos2unix file1.txt file2.txt ... file5.txt

the original paste function I used originally works as expected. From the output of the final file, we can see which markers remain are only of the useful sort. Here is the desired out, achieved:

col1    col2    col3    col4    col5    col6    col6    col7    col8    col9
col1    col2    col3    col4    col5    col6    col6    col7    col8    col9
col1    col2    col3    col4    col5    col6    col6    col7    col8    col9

And here the output from the function used to inspect: cat -vet file1.txt ... :

Col1^ICol2^ICol3^ICol4^ICol5^ICol6^Col7^ICol8^ICol9$
Col1^ICol2^ICol3^ICol4^ICol5^ICol6^Col7^ICol8^ICol9$
Col1^ICol2^ICol3^ICol4^ICol5^ICol6^Col7^ICol8^ICol9$

No ^M markers to be found.

Upvotes: 3

Views: 713

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 754130

Transferring some comments into a (Community Wiki) answer.

Jonathan Leffler commented:

Have you got any DOS line endings confusing things? That is, do the files have CRLF line endings?

And shellter commented:

Use cat -vet file ... file | less and look for ^M at the end of each line.

You confirmed that this was indeed the source of trouble.

Upvotes: 1

Related Questions