Reputation: 4690
I've two CSV file named as file1.csv and file2.csv in file2.csv there is only one column which contain only five records and in file1.csv I've three column which contain more than thousand records I want to get those records which contain in file2.csv for example this is my file1.csv
'A J1, Jhon1',[email protected], A/B-201 Test1
'A J2, Jhon2',[email protected], A/B-202 Test2
'A J3, Jhon3',[email protected], A/B-203 Test3
'A J4, Jhon4',[email protected], A/B-204 Test4
.......and more records
and inside my file2.csv I've only five records right now but in future it can be many
A/B-201 Test1
A/B-2012 Test12
A/B-203 Test3
A/B-2022 Test22
so I've to find records from my file1.csv
at index[2] or index[-1]
this is what I did but it not giving me any output it just returning empty list
import csv
file1 = open('file1.csv','r')
file2 = open('file2.csv','r')
f1 = list(csv.reader(file1))
f2 = list(csv.reader(file2))
new_list = []
for i in f1:
if i[-1] in f2:
new_list.append(i)
print('New List : ',new_list)
it gives me output like this
New List : []
Please help if I did any thing wrong correct me.
Upvotes: 3
Views: 13668
Reputation: 982
in
checks (complexity is O(1) for them, instead of O(N) for lists and tuples).Table
helper for working with table data as with streams (table docs)from convtools import conversion as c
from convtools.contrib.tables import Table
# creating a set of allowed values
allowed_values = {
item[0] for item in Table.from_csv("input2.csv").into_iter_rows(tuple)
}
result = list(
# reading a file with custom quotechar
Table.from_csv("input.csv", dialect=Table.csv_dialect(quotechar="'"))
# stripping last column values
.update(COLUMN_2=c.col("COLUMN_2").call_method("strip"))
# filtering based on allowed values
.filter(c.col("COLUMN_2").in_(c.naive(allowed_values)))
# returning iterable of tuples
.into_iter_rows(tuple)
# # OR outputting csv if needed
# .into_csv("result.csv")
)
"""
>>> In [36]: result
>>> Out[36]:
>>> [('A J1, Jhon1', '[email protected]', 'A/B-201 Test1'),
>>> ('A J3, Jhon3', '[email protected]', 'A/B-203 Test3')]
"""
Upvotes: 0
Reputation: 9681
pandas
This task can be done with relative ease using pandas
. DataFrame documentation here.
In the example below, the two CSV files are read into two DataFrames. The DataFrames are merged using an inner join on the matching columns.
The output shows the merged result.
import pandas as pd
df1 = pd.read_csv('file1.csv', names=['col1', 'col2', 'col3'], quotechar="'", skipinitialspace=True)
df2 = pd.read_csv('file2.csv', names=['match'])
df = pd.merge(df1, df2, left_on=df1['col3'], right_on=df2['match'], how='inner')
The quotechar
and skipinitialspace
parameters are used as the first column in file1
is quoted and contains a comma, and there is leading whitespace after the comma before the last field.
col1 col2 col3
0 A J1, Jhon1 [email protected] A/B-201 Test1
1 A J3, Jhon3 [email protected] A/B-203 Test3
If you choose, the output can easily be written back to a CSV file as:
df.to_csv('path/to/output.csv')
For other DataFrame operations, refer to the documentation linked above.
The method below does not use any libraries, only core Python.
file2
into a list.file1
and search each line to determine if the last value is a match for an item in file2
.Any subsequent data cleaning (if required) will be up to your personal requirements or use-case.
output = []
# Read the matching values into a list.
with open('file2.csv') as f:
matches = [i.strip() for i in f]
# Iterate over file1 and place any matches into the output.
with open('file1.csv') as f:
for i in f:
match = i.split(',')[-1].strip()
if any(match == j for j in matches):
output.append(i)
["'A J1, Jhon1',[email protected], A/B-201 Test1\n",
"'A J3, Jhon3',[email protected], A/B-203 Test3\n"]
Upvotes: 4