Reputation: 1073
So I have been trying to think of how this would be possible, but nothing comes to mind.
I have 2 csv files with dates and data in them. The dates, however, are mismatched. I want to be able to write a code that allows me to loop through all the data, find the matching dates, and then create a text document with the date in common and the 2 data values from the separate sheets. The other alternative is matching them myself in excel.
Any suggestions?
Ex. of what I am trying to accomplish:
Say I have this file1:
01/01/01 | 1
01/02/01 | 2
01/03/01 | 3
01/04/01 | 4
01/05/01 | 5
01/06/01 | 6
01/07/01 | 7
01/08/01 | 8
and this file2:
01/05/01 | 9
01/06/01 | 22
01/07/01 | 33
01/08/01 | 44
01/09/01 | 55
01/10/01 | 66
01/11/01 | 77
01/12/01 | 88
What I would hope to have returned is (after running a loop):
01/05/01 | 5 | 9
01/06/01 | 6 | 22
01/07/01 | 7 | 33
01/08/01 | 8 | 44
Any help would be appreciated, thanks!
Upvotes: 0
Views: 119
Reputation: 1161
I think pandas join logic is well suited for this task since you are effectively asking for an inner join as keyed by date.
First define a parsing function for your dates (which I added in case you needed to treat these as actual dates for further processing in python). Then load each csv file into a separate dataframe. Finally join and drop NaN
values to get the inner join you seek (or just explicitly inner join by setting how
parameter in join
function). You can also easily drop the outptut dataframe into a csv file with the to_csv
function.
If you are looking to get a little more creative with aggregating or joining then pandas is the way to go.
import pandas
parser = lambda date: pandas.datetime.strptime(date.strip(), '%y/%m/%d')
dt = pandas.read_csv('H:\\one.csv', parse_dates = [0], date_parser = parser, index_col=0, delimiter="|", names = ["date", "val"])
dt2 = pandas.read_csv('H:\\two.csv', parse_dates = [0], date_parser = parser, index_col=0, delimiter="|", names = ["date", "val2"])
dt.join(dt2).dropna().to_csv("H:\\output.csv")
This would also work but you lose all of the date-specific functionality within python:
import pandas
dt = pandas.read_csv('H:\\one.csv', index_col=0, delimiter="|", names = ["date", "val"])
dt2 = pandas.read_csv('H:\\two.csv', index_col=0, delimiter="|", names = ["date", "val2"])
dt.join(dt2, how='inner').to_csv("H:\\output.csv")
Upvotes: 1
Reputation: 87084
Use a defaultdict
of lists to store the data from the CSV files, then find those items in the dict that have more than one value in the list.
import csv
from collections import defaultdict
merged = defaultdict(list)
for filename in 'f1.csv', 'f2.csv':
with open(filename) as f:
for date, value in csv.reader(f, delimiter='|'):
merged[date].append(value)
with open('out.csv', 'w') as f:
w = csv.writer(f, delimiter='|')
for date in sorted(merged):
if len(merged[date]) > 1:
w.writerow([date] + merged[date])
This will (almost) produce the file that you require, the only difference being that the delimiter is a single |
instead of |
(with surrounding spaces). I also assumed that the input files are similarly delimited.
Upvotes: 1
Reputation: 8026
Using CSV module to read out both files, convert them to a dictionary which maps date
=> list[values]
, then prints them.
import csv
# Extract CSV
csv1 = []
with open('first.csv', 'r') as f1:
csv_reader = csv.reader(f1)
for row in csv_reader:
csv1.append(row)
csv2 = []
with open('second.csv', 'r') as f2:
csv_reader = csv.reader(f2)
for row in csv_reader:
csv2.append(row)
# Initialize dict which will map dates to values
d = {}
# Map dates to values by getting a default array and appending the values
for row in csv1:
v = d.get(row[0], [])
v.append(row[1])
d[row[0]] = v
for row in csv2:
v = d.get(row[0], [])
v.append(row[1])
d[row[0]] = v
# Print results
for k, v in d.items():
values = ' | '.join(list(map(str, v)))
print(f'{k} | {values}')
Upvotes: 0
Reputation: 77857
Briefly, you're stepping through both files, keeping a "bookmark" in each. If the records match, merge them and print; advance both files. Otherwise, advance the bookmark that's "behind" in the date.
Upvotes: 0