kostya ivanov
kostya ivanov

Reputation: 707

How to split a group of related columns in pandas?

I hope to help me. I have a dataframe. It has 2 columns (CONFIRM_STATUS and OUTCOME), the combination of which affects the logical display of the third column (VALUE).

CONFIRM_STATUS has 4 unique values

result1 = df['CONFIRM_STATUS'].unique()
result1
array(['CONFIRMED', 'PROBABLE', 'SUSPECTED', 'TOTAL'], dtype=object)

OUTCOME has 2 unique value

result2 = df['OUTCOME'].unique()
result2
array(['CASE', 'DEATH'], dtype=object)

As a result, I have 8 unique combinations that directly affect the meaning of the numeric value of the column VALUE. I need to convert these combinations into 8 columns so that each of them displays one of these combinations. Relatively speaking: death, recovery,...

How can this be done with pandas? I know, it turned out not very detailed, here is a screenshot of these several fields.

    EVENT_NAME  SOURCE  DATE_LOW    DATE_HIGH   DATE_REPORT DATE_TYPE   SPATIAL_RESOLUTION  AL0_CODE    AL0_NAME    AL1_CODE    AL1_NAME    AL2_NAME    AL3_NAME    LOCALITY_NAME   LOCATION_TYPE   CONFIRM_STATUS  OUTCOME CUMULATIVE_FLAG VALUE
2752    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   CASE    False   0
2753    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   CASE    True    0
2754    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   DEATH   False   0
2755    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   DEATH   True    0
2756    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    PROBABLE    CASE    False   0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4494958 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    SUSPECTED   DEATH   False   0
4494959 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   CASE    False   24581
4494960 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   CASE    True    2089329
4494961 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   DEATH   False   401
4494962 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   DEATH   True    36179

Upvotes: 1

Views: 65

Answers (1)

Jonathan Leon
Jonathan Leon

Reputation: 5648

I didn't rebuild your dataframe but you should be able to just create 8 new columns like this example (I only show two). You can get fancier with creating the combinations and building the columns but if it's only eight, just code it simply.

df[['CASE_CONFIRMED', 'CASE_PROBABLE']] = ''

Once you have the columns just search on the two columns and set that particular column equal to VALUE.

df.loc[(df['CONFIRM_STATUS'] == 'CONFIRMED') & (df['OUTCOME'] == 'CASE'}, 'CASE_CONFIRMED' ]] = df['VALUE']
df.loc[(df['CONFIRM_STATUS'] == 'PROBABLE') & (df['OUTCOME'] == 'CASE'}, 'CASE_PROBABLE' ]] = df['VALUE']

If that doesn't work, paste part of the dataset using df.head(15).to_json().

Upvotes: 1

Related Questions