Reputation: 17
I have csv file where I have general concepts and corresponding medical terms or phrases. How can I write a loop so that I can group all the phrases to their corresponding concept? I'm not very experienced with python, so I'm not reallt sure how to write the loop.
id concept phrase
--------------------------------
1 general_history H&P
1 general_history history and physical
1 general_history history physical
2 clinic_history clinic history physical
2 clinic_history outpatient h p
3 discharge discharge summary
3 discharge DCS
For the same concept term (or same ID) how can I append the phrases to a list to get something like this:
var = [[general_history, ['history and physical', history physical]],
[clinic_history, ['clinic history physical', 'outpatient h p']],
[discharge, ['discharge summary', 'DCS']]]
Upvotes: 1
Views: 490
Reputation: 614
Use a for loop to and defaultdict to accumulate the terms.
import csv
from collections import defaultdict
var = defaultdict(list)
records = ... # read csv with csv.DictReader
for row in records:
concept = row.get('concept', None)
if concept is None: continue
phrase = row.get('phrase', None)
if phrase is None: continue
var[concept].append(phrase)
print(var)
Upvotes: 1
Reputation: 31
Hopefully, this will solve your question:
# a quick way to to transfer the data into python
csv_string = """id, concept, phrase
1, general_history, H&P
1, general_history, history and physical
1, general_history, history physical
2, clinic_history, clinic history physical
2, clinic_history, outpatient h p
3, discharge, discharge summary
3, discharge, DCS"""
# formats the data as shown in the original question
csv=[[x.strip() for x in line.split(", ")] for line in csv_string.split("\n")]
# makes a dictionary with an empty list that will hold all data points
id_dict = {line[0]:[] for line in csv[1:]}
# iterates and adds all possible combinations of id's and phrases
for line in csv[1:]:
current_id = line[0]
phrases = line[2]
id_dict[current_id].append(phrases)
# makes the data into a list of lists containing only unique phrases
[[current_id, list(set(phrases))] for current_id, phrases in id_dict.items()]
Upvotes: 0
Reputation: 116
If you're using pandas, try filtering. It should look something like this:
new_dataframe = dataframe[dataframe['id'] == id]
then, concat the dataframes,
final_df = pd.concat([new_dataframe1, new_dataframe2], axis = 0)
You can try to do the same thing for concept as well.
Upvotes: 0
Reputation: 4799
Assuming you can parse the csv already, here's how you can go about sorting together by concept
from collections import defaultdict
concepts = defaultdict(list)
""" parse csv """
for row in csv:
id, concept, phrase = row
concepts[concept].append(phrase)
var = [[k, concepts[k]] for k in concepts.keys()]
var
will hold something like this:
[['general_history', ['history and physical', 'history physical']...]
What might even be useful is if you maintain the keys to that dictionary, as var
looks something like this:
{
"general_history": [
"history and physical",
"history physical",
],
...
}
Upvotes: 0