Reputation: 45
I have two files:
The first called file 1 : date,name,age
The second called file 2 : date,name,age
Here is an example:
file1.csv:
2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8
file2.csv:
2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15
I need to join the two file with the same date for this example the output should be:
filex.txt:
2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
Any help ?
Upvotes: 2
Views: 66
Reputation: 133780
try:
awk -F, 'NR==FNR{a[$1]=$2 FS $3;next}{printf("%s%s\n",$0,a[$1]?","a[$1]:"");}' file1 file2 > filex
EDIT: Adding a non-one liner form of solution too now with explanation.
awk -F, 'FNR==NR{ ###-F is to set field separator, FNR==NR condition will be TRUE when first Input_file will be read. file in this case.
a[$1]=$2 FS $3; ###creating an aray named a whose index is $1 and value is $2 FS $3, where FS is field seprator(space by default)
next ###next is awks built-in keyword which will skip all the next statements.
}
{
printf("%s%s\n",$0,a[$1]?","a[$1]:"") ###printing the value of $0(current line of file2) and checking if array a value with index 41 is present
} ###if that is present then print , and array a value with index $1 else print null.
' file1 file2 > filex ###mentioning file1 and mentioning file2 also here.
Upvotes: 0
Reputation: 92904
As you're on Linux OS, here's a short one-liner using awk tool:
awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2
The output:
2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
Upvotes: 0
Reputation: 6073
Without using pandas but a longer solution:
import csv
file1_list = []
with open('file1', 'r') as file1:
reader = csv.reader(file1)
file1_list = [item for item in reader]
file2_list = []
with open('file2', 'r') as file2:
reader = csv.reader(file2)
file2_list = [item for item in reader]
for item in file1_list:
print(item[0])
result = []
for item_1 in file1_list:
for item_2 in file2_list:
if item_1[0] == item_2[0]:
item_1.extend(item_2[1:])
result.append(item_1)
for item_1 in file2_list:
flag = True
for item_2 in result:
if item_1[0] == item_2[0]:
flag = False
if flag:
result.append(item_1)
for item_1 in file1_list:
flag = True
for item_2 in result:
if item_1[0] == item_2[0]:
flag = False
if flag:
result.append(item_1)
print(result)
Upvotes: 0
Reputation: 3623
file1.csv:
2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8
file2.csv:
2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15
Your solution:
import pandas as pd
df1 = pd.read_csv('file1.csv', names=["Name", "Age"], index_col=0,
header=-1)
df2 = pd.read_csv('file2.csv', names=["Name", "Age"], index_col=0,
header=-1)
df = pd.concat([df2, df1], axis=1)
df.to_csv('filex.csv', header=False)
filex.csv:
2015/1/1,,,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
If you want to delete multiple commas in filex.csv
do:
import re
with open('filex.csv', 'r') as desc:
filex = re.sub(',+', ',', desc.read())
with open('filex.txt', 'w') as desc:
desc.write(filex)
filex.txt:
2015/1/1,Rouba,14,
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8
Upvotes: 1