Reputation: 145
I am trying to sort through data from a survey I conducted, for example in my survey I asked if they were diagnosed with any of the following (ADHD, Anxiety, Depression etc). Now, in one of my columns I have multiple values for people who clicked multiple disorders. I'd like to create a column for each disorder and have it so it has a True or False value depending on if the user selected it.
{'Mental_health':
['ADHD, Anxiety, Depression',
'ADHD, Depression, PTSD',
'Anxiety, Borderline Personality Disorder, Depression',
'OCD',
'Anxiety',
'Anxiety',
'ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia',
'ADHD, Anxiety, Autism, Depression, PTSD',
'Anxiety, Depression',
'Depression',
'Depression',
'None of the above',
'Autism, Depression, PTSD',
'None of the above',
'ADHD, PTSD']
}
Upvotes: 0
Views: 223
Reputation: 14103
# sample data
s = """Mental_Health
ADHD, Anxiety, Depression
ADHD, Depression, PTSD
Anxiety, Borderline Personality Disorder, Depression
OCD
Anxiety
Anxiety
ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia
ADHD, Anxiety, Autism, Depression, PTSD
Anxiety, Depression
Depression
Depression
None of the above
Autism, Depression, PTSD
None of the above
ADHD, PTSD"""
df = pd.read_csv(StringIO(s), sep='|')
# str.split then expand list into columns and stack
new = df['Mental_Health'].str.split(', ', expand=True).stack()
# get_dummies and sum
final_df = pd.get_dummies(new).sum(level=0).astype(bool)
ADHD Anxiety Autism Bipolar Borderline Personality Disorder \
0 True True False False False
1 True False False False False
2 False True False False True
3 False False False False False
4 False True False False False
5 False True False False False
6 True True False True True
7 True True True False False
8 False True False False False
9 False False False False False
10 False False False False False
11 False False False False False
12 False False True False False
13 False False False False False
14 True False False False False
Depression None of the above OCD PTSD Schizophrenia
0 True False False False False
1 True False False True False
2 True False False False False
3 False False True False False
4 False False False False False
5 False False False False False
6 True False False True True
7 True False False True False
8 True False False False False
9 True False False False False
10 True False False False False
11 False True False False False
12 True False False True False
13 False True False False False
14 False False False True False
Upvotes: 1
Reputation: 4021
Use str
then split with expand
atrribute:
results = df.Mental_health.str.split(', ', expand=True)
You can append these results to the original df
df_f = pd.concat([df, results], axis=1)
Upvotes: 0