Reputation: 859
I have a bit of a puzzle which I similar to other questions but there is a slight twist.
I have a dataframe - see below. Each record is unique and some records have multiple admit locations in the Concat
column. The CONCAT
columns reflects the progression of a patients admissions location status.
I want to know where patients ended.
I know that if the text within the CONCAT
column is '3E PICU' or '6EN' or '3MN' or '6E' or '3MC' regardless of any other text that is in the column, they ended in the ICU.
I know that if a patient had any of the following admit locations with the CONCAT
column, WITHOUT any of the ICU locations they can be considered "ACUTE": '4E' or '5E NSU' or '3E HKU'(see code below for full list of locations).
I know that if a patient had APU or CPU or PSU regardless of any other location that is in the CONCAT
column, they can be considered "Psych".
I know that if patient is not considered ICU or ACUTE or PSYCH, they were not admitted.
Current Data
ID Concat
1 MAIN, 3E HKU, 6EN
2 ED Eval and Treatment Unit
3 ED Main, 3E PICU
4 ED Main, APU
Desired Data
ID Concat Admit Status
1 MAIN, 3E HKU, 6EN ICU
2 ED Eval and Treatment Unit Non-Admit
3 ED Main, PICU ICU
4 ED Main, APU Psych
5 ED Main, 5E NSU, 3E HKU Acute
I am familiar with the str.contains
code but I need some help in illogically mapping out the code, especially if if else conditions are required.
condition_one=new_ADM1["concat"].str.contains("3E PICU|6EN|3MN|6E|3MC", case=False)
condition_two=new_ADM1["concat"].str.contains("4E|5E NSU|3E HKU|3E|4MN|5E SCU|4MA|7E|7E IRU", case=False)
condition_three=new_ADM1["concat"].str.contains("APU|CPU|PSU", case=False)
Upvotes: 1
Views: 98
Reputation: 71707
Use, Series.str.contains
along with the given regex
patterns, then use np.select
to select the items from choices based on the conditions m1, m2 & m3
:
m1 = df["Concat"].str.contains("(?i)(?:3E PICU|6EN|3MN|6E|3MC)$")
m2 = df["Concat"].str.contains("(?i)(?:4E|5E NSU|3E HKU|3E|4MN|5E SCU|4MA|7E|7E IRU)$")
m3 = df["Concat"].str.contains("(?i)(?:APU|CPU|PSU)$")
df['Admit Status'] = np.select([m1, m2, m3], ['ICU', 'Acute', 'Psych'], 'Non-Admit')
Result:
# print(df)
ID Concat Admit Status
0 1 MAIN, 3E HKU, 6EN ICU
1 2 ED Eval and Treatment Unit Non-Admit
2 3 ED Main, 3E PICU ICU
3 4 ED Main, APU Psych
Upvotes: 1