turnip424
turnip424

Reputation: 332

Combining 2 csv files based on same column using join

I have been trying to combine two csv files with alphanumeric data based on the column names they share, so that I can perform something like a join over them using terminal.

Here's what I tried: (my first column of both the files are identical)

join -t , -1 1 -2 1 file_1.csv file_2.csv > file_3.csv

The merge happens alright and my columns are combined but not in the format I want.

Issue: The file_3 consists of lines from both the files, though separated by commas, but in separate lines.

Example:

     Columns from file_1
     ,Columns from file_2
     Row1 from file_1
     ,Row1 from file_2
     Row2 from file_1
     ,Row2 from file_2

What do I do to get the file_3 data in one single line for each row-merge? Any pointers to proceed would be appreciated.

file_1.csv: (sample data)

Id,Age,Employment,Education,Marital,Occupation,Income,Gender,Deductions,Hours,Adjusted
1,38,Private,College,Unmarried,Service,81838,Female,0,72,0  
2,35,Private,Associate,Absent,Transport,72099,Male,0,30,0  
3,32,Private,HSgrad,Divorced,Clerical,154676.74,Male,0,40,0

file_2.csv: (sample data)

Id,Adjusted,Predicted_Adjusted,Probability_0,Probability_1
1,0,0,0.952957896225136,0.0470421037748636 . 
2,0,0,0.973664421132328,0.0263355788676716 . 
3,0,0,0.966224074718457,0.0337759252815426

Erroneous join:

Id,Age,Employment,Education,Marital,Occupation,Income,Gender,Deductions,Hours,Adjusted
,Adjusted,Predicted_Adjusted,Probability_0,Probability_1
1,38,Private,College,Unmarried,Service,81838,Female,0,72,0
,0,0,0.952957896225136,0.0470421037748636
2,35,Private,Associate,Absent,Transport,72099,Male,0,30,0
,0,0,0.973664421132328,0.0263355788676716
3,32,Private,HSgrad,Divorced,Clerical,154676.74,Male,0,40,0
,0,0,0.966224074718457,0.0337759252815426

Expected output: Every two lines is actually one row and so the expected output should not have the rows split into two, but should represent a homogenous merge of two csv files i.e file_1 and file_2

Upvotes: 1

Views: 643

Answers (2)

Walter A
Walter A

Reputation: 19982

Are the files with windows line-endings \r?
You can try dos2unix file_1.csv and dos2unix file_2.csv ?

Upvotes: 2

Lohit Gupta
Lohit Gupta

Reputation: 1081

This should work:

join -t , -1 1 -2 1 file_1.csv file_2.csv|paste -d' ' - - > file_3.csv

Upvotes: 1

Related Questions