Reputation: 9536
Sorry for the unclear title, but I'm having a hard time putting what I want to do into words.
I have a CSV dataset that's not very clear in how it presents information:
Basically, each image may be associated with one or more subtypes. I managed to manipulate the dataset in such a way that it's much easier to read:
Now, I'd like to condense each row into one ID, and True/False for each of the subtypes, like so:
ID Epidural Intraparenchymal Intraventricular Subarachnoid Subdural Any
aec8e68b3 False False False True False True
Hopefully I've made myself clear. Can anybody help in this endeavor?
edit: dictionary at the request of @Ch3steR
{'ID': {21: 'aec8e68b3',
23: 'aec8e68b3',
57: 'ff7125125',
59: 'ff7125125',
115: '9b7d000a2',
119: '9b7d000a2',
172: '85f1fa3bd',
173: '85f1fa3bd',
181: '4598fa77a',
182: '4598fa77a',
184: '4598fa77a',
185: '4598fa77a',
208: '4c45dcf36',
209: '4c45dcf36',
313: 'a837bb1fc',
315: 'a837bb1fc',
317: 'a837bb1fc',
358: '8927bc6ee',
359: '8927bc6ee',
369: 'ffc54be74',
371: 'ffc54be74',
439: '945b3f1e4',
442: '945b3f1e4',
443: '945b3f1e4',
483: '35b9fc2cc'},
'Category': {21: 'subarachnoid',
23: 'any',
57: 'subarachnoid',
59: 'any',
115: 'intraparenchymal',
119: 'any',
172: 'subdural',
173: 'any',
181: 'intraparenchymal',
182: 'intraventricular',
184: 'subdural',
185: 'any',
208: 'subdural',
209: 'any',
313: 'intraparenchymal',
315: 'subarachnoid',
317: 'any',
358: 'subdural',
359: 'any',
369: 'subarachnoid',
371: 'any',
439: 'intraparenchymal',
442: 'subdural',
443: 'any',
483: 'subarachnoid'}}
Upvotes: 0
Views: 232
Reputation: 20669
You can use pd.crosstab
here.
df
ID Category
21 aec8e68b3 subarachnoid
23 aec8e68b3 any
57 ff7125125 subarachnoid
59 ff7125125 any
115 9b7d000a2 intraparenchymal
119 9b7d000a2 any
172 85f1fa3bd subdural
173 85f1fa3bd any
181 4598fa77a intraparenchymal
182 4598fa77a intraventricular
184 4598fa77a subdural
185 4598fa77a any
208 4c45dcf36 subdural
209 4c45dcf36 any
pd.crosstab(df['ID'], df['Category']).astype(bool)
Category any intraparenchymal intraventricular subarachnoid subdural
ID
4598fa77a True True True False True
4c45dcf36 True False False False True
85f1fa3bd True False False False True
9b7d000a2 True True False False False
aec8e68b3 True False False True False
ff7125125 True False False True False
_.rename_axis(index=None, columns=None)
OR
Using pd.pivot_table
df.pivot_table(index='ID', columns='Category', fill_value=0, aggfunc='size')
df.assign(val = True).pivot_table(index='ID', columns= 'Category',
values = 'val',fill_value=False)
Another idea of using df.unstack
df.assign(val = True).set_index(['ID', 'Category']).val.unstack(fill_value= False)
Upvotes: 1
Reputation: 1432
You can use reset index to get a desired format.
pd.crosstab(df['ID'], df['Category']).astype(bool).reset_index()
Upvotes: 0