Apply Function To Multiple Variables

Question

I have a database that looks like this.

ID   Covid_pos  Asymptomatic   Fever   Cough  Rash
1        1          0            1      0      1
2        0          0            0      1      0
3        1          1            0      1      1
4        1          0            1      0      1
5        0          1            1      0      0

From this data my goal is to create an output that looks like this

Symptom          All Tested(5308, 100%)   SARS-COV-2 PCR positive (N,%) 
Asymptomatic        2528(47.63%)                 163(6.45%)
Fever               958(23.85%)                  43(3.53%)
Cough               159(3.95%)                   22(9.72%)
Rash                19(23.05%)                   88(18.40%)

I have written a code that will produce the desired output for one of my my variables; however, I would like to create either a macro or function so I can apply it to all of my symptom variables. Thus, instead of copying and pasting this code 8+ times and changing it whenever the code says "Asymptomatic" to the next symptom, I was curious if there are any other options you suggest I explore. Somewhat new to to Python, so all strategies are welcome!

AsyOdds_Percent = pd.crosstab(df_merged2["Asymptomatic"],df_merged2.Covid_pos)
AsyOdds_Percent = pd.DataFrame(AsyOdds_Percent.to_records()).rename(columns={'Asymptomatic':'Asymptomatic','0':'Neg_%','1':'Pos_%'}).fillna(0)
AsyOdds_Percent["Total_%"] = AsyOdds_Percent.sum(axis=1)

AsyOdds_Count=pd.crosstab(df_merged2["Asymptomatic"],df_merged2.Covid_pos)
AsyOdds_Count1 = pd.DataFrame(AsyOdds_Count.to_records()).rename(columns={'Asymptomatic':'Asymptomatic','0':'Neg_N','1':'Pos_N'}).fillna(0)
AsyOdds_Count1["Total_N"] = AsyOdds_Count1.sum(axis=1)

cols = AsyOdds_Percent.columns[1:4]
AsyOdds_Percent[cols] = AsyOdds_Percent[cols]/AsyOdds_Percent[cols].sum()*100
Merged = pd.merge(AsyOdds_Count1,AsyOdds_Percent, on='Asymptomatic', how='left')
Merged['%_Pos'] = (Merged['Pos_N']/Merged['Total_N'])*100
Merged['%_Pos'] = round(Merged['%_Pos'], 2)
Merged['Total_%'] = round(Merged['Total_%'], 2)
Merged = Merged[['Asymptomatic','Pos_N','Pos_%','Neg_N','Neg_%','Total_N','Total_%','%_Pos']]
Merged = Merged.loc[Merged['Asymptomatic'] == 1]
Merged = Merged[['Asymptomatic','Total_N','Total_%','Pos_N','%_Pos']]
Merged = Merged.rename(columns = {"Asymptomatic": "Symptoms"})

a1 = (Merged["Symptoms"] == 1)
conditions = [a1]
Merged['Symptoms'] = np.select([a1], ['Asymptomatic'])
  
Merged['All Tested (5308, 100%)'] = Merged['Total_N'].map(str) + '(' + Merged['Total_%'].map(str) + '%)'
Merged['SARS-COV-2 PCR positive (N,%)'] = Merged['Pos_N'].map(str) + '(' + Merged['%_Pos'].map(str) + '%)'
Merged=Merged[['Symptoms','All Tested (5308, 100%)','SARS-COV-2 PCR positive (N,%)']]
print(Merged)

OUTPUT:

       Symptoms All Tested (5308, 100%) SARS-COV-2 PCR positive (N,%)
1  Asymptomatic            2528(47.63%)                    163(6.45%)

Valdi_Bo · Accepted Answer

I used the following data sample (df):

   Covid_pos  Asymptomatic  Fever  Cough
0          1             0      1      0
1          0             0      0      1
2          1             1      0      1
3          1             0      1      0
4          0             1      1      0
5          1             0      1      0
6          0             1      1      0
7          1             0      0      1
8          0             0      0      0
9          0             0      0      0

Start from defining 3 functions:

def colSums(col):
    return pd.Series([col.sum(), col.loc[1].sum()], index=['All', 'Pos'])
def withPct(x):
    return f'{x}({x / total * 100}%)'
def colTitle(head, n1):
    return f'{head}({n1}, {n1/total*100}%)'

Then compute required totals:

total = df.index.size
totalPos = df.Covid_pos.sum()

And the whole processing (for all source columns) boils down to just 2 instructions:

res = df.set_index('Covid_pos').apply(colSums).T.applymap(withPct)
res.columns = [colTitle('All Tested', total),
    colTitle('SARS-COV-2 PCR positive', totalPos)]

The result is:

             All Tested(10, 100.0%) SARS-COV-2 PCR positive(5, 50.0%)
Asymptomatic               3(30.0%)                          1(10.0%)
Fever                      5(50.0%)                          3(30.0%)
Cough                      3(30.0%)                          2(20.0%)

Edit

To compute percentage in the "positive" column relative to the number of positive cases, proceed as follows:

Compute the result in absolute numbers:

 res = df.set_index('Covid_pos').apply(colSums).T

Compute percentages dividing each column by respective divisor:
```
 wrk = res / [total, totalPos] * 100; wrk
```

Overwrite each column in res with concatenation of "original" values and percentage surrounded in parentheses.

 res.All = res.All.astype(str) + '(' + wrk.All.astype(str) + '%)'
 res.Pos = res.Pos.astype(str) + '(' + wrk.Pos.astype(str) + '%)'

Now the result is:

             All Tested(10, 100.0%) SARS-COV-2 PCR positive(5, 50.0%)
Asymptomatic               3(30.0%)                          1(20.0%)
Fever                      5(50.0%)                          3(60.0%)
Cough                      3(30.0%)                          2(40.0%)

withPct function is now not needed.

Apply Function To Multiple Variables

Answers (2)

Edit

Related Questions