Reputation: 45

Concate 2 files having same id using python or bash

I have two files:

The first called file 1 : date,name,age

The second called file 2 : date,name,age

Here is an example:

file1.csv:

2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8

file2.csv:

2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15

I need to join the two file with the same date for this example the output should be:

filex.txt:

2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

Any help ?

Upvotes: 2

Answers (4)

RavinderSingh13

Reputation: 133780

try:

awk -F, 'NR==FNR{a[$1]=$2 FS $3;next}{printf("%s%s\n",$0,a[$1]?","a[$1]:"");}' file1  file2 > filex

EDIT: Adding a non-one liner form of solution too now with explanation.

awk -F, 'FNR==NR{                                     ###-F is to set field separator, FNR==NR condition will be TRUE when first Input_file will be read. file in this case.
                a[$1]=$2 FS $3;                       ###creating an aray named a whose index is $1 and value is $2 FS $3, where FS is field seprator(space by default)
                next                                  ###next is awks built-in keyword which will skip all the next statements.
                }
                {
                printf("%s%s\n",$0,a[$1]?","a[$1]:"") ###printing the value of $0(current line of file2) and checking if array a value with index 41 is present
                }                                     ###if that is present then print , and array a value with index $1 else print null.
        ' file1 file2 > filex                         ###mentioning file1 and mentioning file2 also here.

Upvotes: 0

RomanPerekhrest

Reputation: 92904

As you're on Linux OS, here's a short one-liner using awk tool:

awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2

The output:

2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

Upvotes: 0

Nurjan

Reputation: 6073

Without using pandas but a longer solution:

import csv

file1_list = []
with open('file1', 'r') as file1:
    reader = csv.reader(file1)
    file1_list = [item for item in reader]


file2_list = []
with open('file2', 'r') as file2:
    reader = csv.reader(file2)
    file2_list = [item for item in reader]


for item in file1_list:
    print(item[0])


result = []

for item_1 in file1_list:
    for item_2 in file2_list:
        if item_1[0] == item_2[0]:
            item_1.extend(item_2[1:])
            result.append(item_1)

for item_1 in file2_list:
    flag = True
    for item_2 in result:
        if item_1[0] == item_2[0]:
            flag = False
    if flag:
        result.append(item_1)

for item_1 in file1_list:
    flag = True
    for item_2 in result:
        if item_1[0] == item_2[0]:
            flag = False
    if flag:
        result.append(item_1)

print(result)

Upvotes: 0

glegoux

Reputation: 3623

file1.csv:

2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8

file2.csv:

2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15

Your solution:

import pandas as pd

df1 = pd.read_csv('file1.csv', names=["Name", "Age"], index_col=0,
                  header=-1)
df2 = pd.read_csv('file2.csv', names=["Name", "Age"], index_col=0,
                  header=-1)

df = pd.concat([df2, df1], axis=1)

df.to_csv('filex.csv', header=False)

filex.csv:

2015/1/1,,,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

If you want to delete multiple commas in filex.csv do:

import re

with open('filex.csv', 'r') as desc:
    filex = re.sub(',+', ',', desc.read())

with open('filex.txt', 'w') as desc:
    desc.write(filex)

filex.txt:

2015/1/1,Rouba,14,
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

Upvotes: 1

Concate 2 files having same id using python or bash

Answers (4)

Related Questions