Reputation: 365
I have a presence/absence dataframe that looks like this (it's much larger but have reduced it for this question):
annotations factor1 factor2 factor3 Class
heroine 1 0 1 OPIOID_TYPE
he smokes 0 1 0 OTHER_DRUG_USE
heroin 1 0 1 OPIOID_TYPE
What I would like to do is create a new dataframe for each unique value in 'Class' and insert each value in class as the name of the last column for each dataframe and record presence/absence.
In other words:
annotations factor1 factor2 factor3 OPIOID_TYPE
heroine 1 0 1 1
he smokes 0 1 0 0
heroin 1 0 1 1
and:
annotations factor1 factor2 factor3 OTHER_DRUG_USE
heroine 1 0 1 0
he smokes 0 1 0 1
heroin 1 0 1 0
In reality, my dataframe is much larger with 2289 rows and 1273 columns and exactly 23 unique values in 'Class' for a total of 23 new dataframes.
I assume a loop structure would work here but I have limited experience with python looping.
Upvotes: 1
Views: 2762
Reputation: 120399
You can iterate over your Class
values:
dfs = {}
for klass in df['Class'].unique():
dfs[klass] = df.assign(**{klass: df['Class'].eq(klass).astype(int)}) \
.drop(columns='Class')
Now you have a dict indexed by Class
values:
>>> dfs.keys()
dict_keys(['OPIOID_TYPE', 'OTHER_DRUG_USE'])
>>> dfs['OPIOID_TYPE']
annotations factor1 factor2 factor3 OPIOID_TYPE
0 heroine 1 0 1 1
1 he smokes 0 1 0 0
2 heroin 1 0 1 1
>>> dfs['OTHER_DRUG_USE']
annotations factor1 factor2 factor3 OTHER_DRUG_USE
0 heroine 1 0 1 0
1 he smokes 0 1 0 1
2 heroin 1 0 1 0
Now if you really want real python variables, you can use locals()
to create them dynamically:
for idx, klass in enumerate(df['Class'].unique(), 1):
print(f"df{idx} is for '{klass}' class")
locals()[f"df{idx}"] = df.assign(**{klass: df['Class'].eq(klass).astype(int)}) \
.drop(columns='Class')
# Output:
df1 is for 'OPIOID_TYPE' class
df2 is for 'OTHER_DRUG_USE' class
Output:
>>> df1
annotations factor1 factor2 factor3 OPIOID_TYPE
0 heroine 1 0 1 1
1 he smokes 0 1 0 0
2 heroin 1 0 1 1
>>> df2
annotations factor1 factor2 factor3 OTHER_DRUG_USE
0 heroine 1 0 1 0
1 he smokes 0 1 0 1
2 heroin 1 0 1 0
Upvotes: 1
Reputation: 323226
We can do get_dummies
and save the dfs into dict
s = df.pop('Class').str.get_dummies()
d = {x : df.join(s[[x]]) for x in s}
Example output below
d['OPIOID_TYPE']
Out[43]:
annotations factor1 factor2 factor3 OPIOID_TYPE
0 heroine 1 0 1 1
1 hesmokes 0 1 0 0
2 heroin 1 0 1 1
d['OTHER_DRUG_USE']
Out[44]:
annotations factor1 factor2 factor3 OTHER_DRUG_USE
0 heroine 1 0 1 0
1 hesmokes 0 1 0 1
2 heroin 1 0 1 0
Upvotes: 1