Reputation: 11
I have two csv files simulating patient data that I need to read in and compare.
Without using Pandas, I need to sort the second file by Subject_ID and append the sex of the patient to the first csv file. I don't know where to start without using Pandas. Any ideas?
So far my plan is to somehow work with a dictionary to try to re-group the second file.
with open('Patient_Sex.csv','r') as file_sex, open('Patient_FBG.csv','r') as file_fbg: patient_reader = csv.DictReader(file_sex) fbg_reader = csv.DictReader(file_fbg)
After this, it gets really muddy for me.
Upvotes: 0
Views: 126
Reputation: 9639
You can do this without importing any modules by reading the csv files as lists of lines, and append lines in the main file with the sex upon a matching name:
with open('test1.csv') as csvfile:
main_csv = [i.rstrip() for i in csvfile.readlines()]
with open('test2.csv') as csvfile:
second_csv = [i.rstrip() for i in csvfile.readlines()]
for n, i in enumerate(main_csv):
if n == 0:
main_csv[n] = main_csv[n] + ',SEX'
else:
patient = i.split(',')[0]
hits = [line.split(',')[-1] for line in second_csv if line.startswith(patient)]
if hits:
main_csv[n] = main_csv[n] + ',' + hits[0]
else:
main_csv[n] = main_csv[n] + ','
with open('test.csv', 'w') as f:
f.write('\n'.join(main_csv))
Upvotes: 0
Reputation: 46
I think this is what you are looking for, assuming you are working with .csv files based on the data that you posted.
Basically you can just parse the files as JSON and then you can manipulate them easily.
import csv
import json
gender_data = []
full_data = []
with open("stack/new.csv", encoding="utf-8") as csvf:
csvReader = csv.DictReader(csvf)
for row in csvReader:
gender_data.append(row)
with open("stack/info.csv", encoding="utf-8") as csvf:
csvReader = csv.DictReader(csvf)
for row in csvReader:
full_data.append(row)
for x in gender_data:
for y in full_data:
if x["SUBJECT_ID"] == y["SUBJECT_ID"]:
y["SEX"] = x["SEX"]
f = csv.writer(open("stack/test.csv", "w+"))
f.writerow(["SUBJECT_ID", "YEAR_1", "YEAR_2", "YEAR_3", "SEX"])
for x in full_data:
f.writerow(
[
x["SUBJECT_ID"],
x["YEAR_1"],
x["YEAR_2"],
x["YEAR_3"],
x["SEX"] if "SEX" in x else "",
]
)
Upvotes: 1