nrse_i
nrse_i

Reputation: 17

Python append terms that had same id but different values to a list?

I have csv file where I have general concepts and corresponding medical terms or phrases. How can I write a loop so that I can group all the phrases to their corresponding concept? I'm not very experienced with python, so I'm not reallt sure how to write the loop.

id   concept           phrase
--------------------------------
1    general_history   H&P
1    general_history   history and physical
1    general_history   history physical
2    clinic_history    clinic history physical
2    clinic_history    outpatient h p
3    discharge         discharge summary
3    discharge         DCS

For the same concept term (or same ID) how can I append the phrases to a list to get something like this:

var = [[general_history, ['history and physical', history physical]], 
       [clinic_history, ['clinic history physical', 'outpatient h p']], 
       [discharge, ['discharge summary', 'DCS']]]

Upvotes: 1

Views: 490

Answers (4)

skullgoblet1089
skullgoblet1089

Reputation: 614

Use a for loop to and defaultdict to accumulate the terms.

import csv
from collections import defaultdict
var = defaultdict(list)
records = ...  # read csv with csv.DictReader
for row in records:
    concept = row.get('concept', None)
    if concept is None: continue
    phrase = row.get('phrase', None)
    if phrase is None: continue
    var[concept].append(phrase)
print(var)

Upvotes: 1

S Haskin
S Haskin

Reputation: 31

Hopefully, this will solve your question:

# a quick way to to transfer the data into python
csv_string = """id, concept, phrase
1, general_history, H&P
1, general_history, history and physical
1, general_history, history physical
2, clinic_history, clinic history physical
2, clinic_history, outpatient h p
3, discharge, discharge summary
3, discharge, DCS"""

# formats the data as shown in the original question
csv=[[x.strip() for x in line.split(", ")]  for line in csv_string.split("\n")]

# makes a dictionary with an empty list that will hold all data points
id_dict = {line[0]:[] for line in csv[1:]}

# iterates and adds all possible combinations of id's and phrases
for line in csv[1:]:
    current_id = line[0]
    phrases = line[2]
    id_dict[current_id].append(phrases)

# makes the data into a list of lists containing only unique phrases
[[current_id, list(set(phrases))] for current_id, phrases in id_dict.items()]

Upvotes: 0

Ahmad Chaiban
Ahmad Chaiban

Reputation: 116

If you're using pandas, try filtering. It should look something like this:

new_dataframe = dataframe[dataframe['id'] == id]

then, concat the dataframes,

final_df = pd.concat([new_dataframe1, new_dataframe2], axis = 0)

You can try to do the same thing for concept as well.

Upvotes: 0

M Z
M Z

Reputation: 4799

Assuming you can parse the csv already, here's how you can go about sorting together by concept

from collections import defaultdict

concepts = defaultdict(list)

""" parse csv """

for row in csv:
    id, concept, phrase = row
    concepts[concept].append(phrase)

var = [[k, concepts[k]] for k in concepts.keys()]

var will hold something like this:

[['general_history', ['history and physical', 'history physical']...]

What might even be useful is if you maintain the keys to that dictionary, as var looks something like this:

{
  "general_history": [
    "history and physical",
    "history physical",
  ],
 ...
}

Upvotes: 0

Related Questions