Reputation: 17
I have a dataframe with 5 columns and want to convert 2 of the columns (Chemo and Surgery) based on their values (greater than 0) to rows (diagnosis series) and add the information like the individual id and diagnosis at age to the rows.
Here is my dataframe
import pandas as pd
data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
print df
I have tried to get the values where the Chemo/Surgery is greater than 0 but when I tried to add it as a row, it doesn't work.
This is what I want the end result to be.
ID Diagnosis Age at Diagnosis
0 A-1 Birth 0
1 A-1 Lung cancer 25
2 A-1 Chemo 25
3 A-1 Surgery 25
4 A-1 Death 50
5 A-2 Birth 0
6 A-2 Brain cancer 12
7 A-2 Chemo 12
8 A-2 Skin cancer 20
9 A-2 Chemo 20
10 A-2 Surgery 20
11 A-2 Current age 23
12 A-3 Birth 0
13 A-3 Brain cancer 30
14 A-3 Surgery 30
15 A-3 Lung cancer 33
16 A-3 Chemo 33
17 A-3 Current age 35
This is one of the things I have tried:
chem = "Chemo"
try_df = (df[chem] > 1)
nd = df[try_df]
df["Diagnosis"] = df[chem]
print df
Upvotes: 1
Views: 178
Reputation: 150735
We can melt the two columns Chemo
and Surgery
, then drop all the zero and concat
back:
# melt the two columns
new_df = df[['ID', 'Chemo', 'Surgery']].melt(id_vars='ID',
value_name='Age at Diagnosis',
var_name='Diagnosis')
# filter out the zeros
new_df = new_df[new_df['Age at Diagnosis'].ne('0')]
# concat with the original dataframe, ignoring the extra columns
new_df = pd.concat((df,new_df), sort=False, join='inner')
# sort values
new_df.sort_values(['ID','Age at Diagnosis'])
Output:
ID Diagnosis Age at Diagnosis
0 A-1 Birth 0
1 A-1 Lung cancer 25
1 A-1 Chemo 25
12 A-1 Surgery 25
2 A-1 Death 50
3 A-2 Birth 0
4 A-2 Brain cancer 12
4 A-2 Chemo 12
5 A-2 Skin cancer 20
5 A-2 Chemo 20
16 A-2 Surgery 20
6 A-2 Current age 23
7 A-3 Birth 0
8 A-3 Brain cancer 30
19 A-3 Surgery 30
9 A-3 Lung cancer 33
9 A-3 Chemo 33
10 A-3 Current age 35
Upvotes: 3
Reputation: 1541
This attempt is pretty verbose and takes a few steps. WE can't do a simple pivot or index/column stacking because we need to modify one column with partial results from another. This requires splitting and appending.
Firstly, convert your dataframe into dtypes we can work with.
data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
df[["Age at Diagnosis", "Chemo", "Surgery"]] = df[["Age at Diagnosis", "Chemo", "Surgery"]].astype(int)
Now we split the thing up into bits and pieces.
# I like making a copy or resetting an index so that
# pandas is not operating off a slice
df_chemo = df[df.Chemo > 0].copy()
df_surgery = df[df.Surgery > 0].copy()
# drop columns you don't need
df_chemo.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df_surgery.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df.drop(["Chemo", "Surgery"], axis=1, inplace=True)
# Set Chemo and Surgery Diagnosis
df_chemo.Diagnosis = "Chemo"
df_surgery.Diagnosis = "Surgery"
Then append everything together. You can do this because the column dimensions match.
df_new = df.append(df_chemo).append(df_surgery)
# make it look pretty
df_new.sort_values(["ID", "Age at Diagnosis"]).reset_index(drop=True)
Upvotes: 1