Combine Pandas Columns by Corresponding Dictionary Values

Question

I am looking to quickly combine columns that are genetic complements of each other. I have a large data frame with counts and want to combine columns where the column names are complements. I have a currently have a system that

Gets the complement of a column name
Checks the columns names for the compliment
Adds together the columns if there is a match
Then deletes the compliment column

However, this is slow (checking every column name) and gives different column names based on the ordering of the columns (i.e. deletes different compliment columns between runs). I was wondering if there was a way to incorporate a dictionary key:value pair to speed the process and keep the output consistent. I have an example dataframe below with the desired result (ATTG|TAAC & CGGG|GCCC are compliments).

df = pd.DataFrame({"ATTG": [3, 6, 0, 1],"CGGG" : [0, 2, 1, 4], 
"TAAC": [0, 1, 0, 1], "GCCC" : [4, 2, 0, 0], "TTTT": [2, 1, 0, 1]}) 

## Current Pseudocode
for item in df.columns():
    if compliment(item) in df.columns():
        df[item] = df[item] + df[compliment(item)]
        del df[compliment(item)]

## Desired Result
df_result = pd.DataFrame({"ATTG": [3, 7, 0, 2],"CGGG" : [4, 4, 1, 4], "TTTT": [2, 1, 0, 1]})

ALollz · Accepted Answer

Translate the columns, then assign the columns the translation or original that is sorted first. This allows you to group compliments.

import numpy as np

mytrans = str.maketrans('ATCG', 'TAGC')
df.columns = np.sort([df.columns, [x.translate(mytrans) for x in df.columns]], axis=0)[0, :]

df.groupby(level=0, axis=1).sum()
#   AAAA  ATTG  CGGG
#0     2     3     4
#1     1     7     4
#2     0     0     1
#3     1     2     4

Combine Pandas Columns by Corresponding Dictionary Values

Answers (1)

Related Questions