John
John

Reputation: 43179

Select columns in a pandas DataFrame

I have a pandas dataframe with hundreds of columns of antibiotic names. Each specific antibiotic is coded in the dataframe as ending in E, T, or P to indicate empirical, treatment, or prophylactic regimens.

An example excerpt from the column list is:

['MeropenemP', 'MeropenemE', 'MeropenemT', DoripenemP', 'DoripenemE', 'DoripenemT', ImipenemP', 'ImipenemE', 'ImipenemT', 'BiapenemP', 'BiapenemE', 'BiapenemT', 'PanipenemP', 'PanipenemE', 'PanipenemT','PipTazP', 'PipTazE', 'PipTazT','PiperacillinP', 'PiperacillinE', 'PiperacillinT']

A small sample of data is located here:

Sample antibiotic data

It is simple enough for me to separate out columns any type into separate dataframes with a regex, e.g. to select all the empirically prescribed antibiotics columns I use:

E_cols = master.filter(axis=1, regex=('[a-z]+E$'))

Each column has a binary value (0,1) for prescription of each antibiotic regimen type per person (row).

Question: How would I go about summing the rows of all columns (1's) for each type of regimen type and generating a new column for each result in the dataframe e.g. total_emperical, total_prophylactic, total_treatment.

The reason I want to add to the existing dataframe is that I wish to filter on other values for each regimen type.

Upvotes: 2

Views: 3888

Answers (1)

EdChum
EdChum

Reputation: 393843

Once you've generated the list of columns that match your reg exp then you can just create the new total columns like so:

df['total_emperical'] = df[E_cols].sum(axis=1)

and repeat for the other totals.

Passing axis=1 to sum will sum row-wise

Upvotes: 1

Related Questions