Alexander
Alexander

Reputation: 145

Expanding one column to many columns with pandas?

I am trying to sort through data from a survey I conducted, for example in my survey I asked if they were diagnosed with any of the following (ADHD, Anxiety, Depression etc). Now, in one of my columns I have multiple values for people who clicked multiple disorders. I'd like to create a column for each disorder and have it so it has a True or False value depending on if the user selected it.

{'Mental_health':
    ['ADHD, Anxiety, Depression',
     'ADHD, Depression, PTSD',
     'Anxiety, Borderline Personality Disorder, Depression',
     'OCD',
     'Anxiety',
     'Anxiety',
     'ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia',
     'ADHD, Anxiety, Autism, Depression, PTSD',
     'Anxiety, Depression',
     'Depression',
     'Depression',
     'None of the above',
     'Autism, Depression, PTSD',
     'None of the above',
     'ADHD, PTSD']
}

Upvotes: 0

Views: 223

Answers (2)

It_is_Chris
It_is_Chris

Reputation: 14103

# sample data
s = """Mental_Health
ADHD, Anxiety, Depression
ADHD, Depression, PTSD
Anxiety, Borderline Personality Disorder, Depression
OCD
Anxiety
Anxiety
ADHD, Anxiety, Bipolar, Borderline Personality Disorder, Depression, PTSD, Schizophrenia
ADHD, Anxiety, Autism, Depression, PTSD
Anxiety, Depression
Depression
Depression
None of the above
Autism, Depression, PTSD
None of the above
ADHD, PTSD"""
df = pd.read_csv(StringIO(s), sep='|')
# str.split then expand list into columns and stack
new = df['Mental_Health'].str.split(', ', expand=True).stack()
# get_dummies and sum
final_df = pd.get_dummies(new).sum(level=0).astype(bool)

     ADHD  Anxiety  Autism  Bipolar  Borderline Personality Disorder  \
0    True     True   False    False                            False   
1    True    False   False    False                            False   
2   False     True   False    False                             True   
3   False    False   False    False                            False   
4   False     True   False    False                            False   
5   False     True   False    False                            False   
6    True     True   False     True                             True   
7    True     True    True    False                            False   
8   False     True   False    False                            False   
9   False    False   False    False                            False   
10  False    False   False    False                            False   
11  False    False   False    False                            False   
12  False    False    True    False                            False   
13  False    False   False    False                            False   
14   True    False   False    False                            False   

    Depression  None of the above    OCD   PTSD  Schizophrenia  
0         True              False  False  False          False  
1         True              False  False   True          False  
2         True              False  False  False          False  
3        False              False   True  False          False  
4        False              False  False  False          False  
5        False              False  False  False          False  
6         True              False  False   True           True  
7         True              False  False   True          False  
8         True              False  False  False          False  
9         True              False  False  False          False  
10        True              False  False  False          False  
11       False               True  False  False          False  
12        True              False  False   True          False  
13       False               True  False  False          False  
14       False              False  False   True          False  

Upvotes: 1

jcaliz
jcaliz

Reputation: 4021

Use str then split with expand atrribute:

results = df.Mental_health.str.split(', ', expand=True)

You can append these results to the original df

df_f = pd.concat([df, results], axis=1)

Upvotes: 0

Related Questions