Luca
Luca

Reputation: 11026

filter CSV file with pandas

I have a CSV file where each row holds some data about a particular patient and a single patient can have multiple rows associated with him or her.

The file itself contains thousands of patient records and what I want to do is randomly select 100 patients from the file and then get all records associated with them and then save them to another CSV file.

So, the file could look like, for example:

patient_id   Date          Diagnosis   Comments
001-001      23.12.2008    Normal      Normal
001-001      23.12.2009    Normal      Normal
001-002      08.11.2007    Normal      Normal
001-003
....

So, I can load the file as:

frame = pd.read_csv('file.csv')
# Get the unique subjects
unique_subjects = frame['patient_id'].unique()
# Use numpy to randomly select some patients
random_us = np.random.choice(unique_subjects, 100)

And then I can load the CSV and then check row by row and select which rows to write back to the CSV file.

I have a feeling pandas might provide something more direct and I wonder if there is a way to pipe all these operations with it.

Upvotes: 0

Views: 51

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

You can use isin to filter those id needed:

random_records = frame[frame['patient_id'].isnin(random_us)]

Upvotes: 1

Related Questions