Reputation: 2874
I have five csv files, which I would like to paste
together using the shell function. This basically performs a concatenation of the rows in several text files. What I am after is seen in example 8 in this tutorial
I am doing this from Python via subprocess.call()
, however doing it directly in terminal produces the same confusing results.
My files are all tab delimited (which is the default delimiter of the paste function)
When I use the function on 2, 3, ... n files, it seems as though the headers of the second to n'th files are added as a second row, with the header of only the first file apearing in the first row.
Here is my command:
paste outfile.txt tmp_1.txt tmp_2.txt tmp_3.txt tmp_4 > final.txt
Here is the output:
col1 col2 col3 # <-- 1st file has 3 columns
col4 col5 # <-- 2nd file has 2 columns
col6 # <-- 3rd file has 1 columns
col7 # <-- 4th file has 1 columns
col8 col9 # <-- 5th file has 2 columns
After this, however, the rows carry on in a different fashion (consistently to the end of the files):
col1 col2 col3
col4 col5 col6 col6 col7 col8 col9
col1 col2 col3
col4 col5 col6 col6 col7 col8 col9
[Those two code blocks follow on from each other]
I can't find any more options I could specify in this documentation, explicitly entering -d'\t'
doesn't change anything. I have also tried fewer or more files, changing the oder of the files (in case my first one has some carrriage returns etc. in it, but the results are always the same.
Update #1
Here is a piece of the output from the command recommended by @shellter in the comments: cat -vet file1.txt file2.txt ... file5.txt | less
:
Col1^ICol2^ICol3^M$
Some text was here^I2^I-3^M$
Some text was here^I2^I-1^M$
Some text was here^I2^I-2^M$
Some text was here^I2^I-1^M$
You can see the ^I
markers for the tabs and the ^M
plus $
for end-of-line / carriage-return / newline.
update #2
Having applied the shell function dos2unix
to my files:
dos2unix file1.txt file2.txt ... file5.txt
the original paste function I used originally works as expected. From the output of the final file, we can see which markers remain are only of the useful sort. Here is the desired out, achieved:
col1 col2 col3 col4 col5 col6 col6 col7 col8 col9
col1 col2 col3 col4 col5 col6 col6 col7 col8 col9
col1 col2 col3 col4 col5 col6 col6 col7 col8 col9
And here the output from the function used to inspect: cat -vet file1.txt ...
:
Col1^ICol2^ICol3^ICol4^ICol5^ICol6^Col7^ICol8^ICol9$
Col1^ICol2^ICol3^ICol4^ICol5^ICol6^Col7^ICol8^ICol9$
Col1^ICol2^ICol3^ICol4^ICol5^ICol6^Col7^ICol8^ICol9$
No ^M
markers to be found.
Upvotes: 3
Views: 713
Reputation: 754130
Transferring some comments into a (Community Wiki) answer.
Jonathan Leffler commented:
Have you got any DOS line endings confusing things? That is, do the files have CRLF line endings?
And shellter commented:
Use
cat -vet file ... file | less
and look for^M
at the end of each line.
You confirmed that this was indeed the source of trouble.
Upvotes: 1