Cody Glickman
Cody Glickman

Reputation: 524

Combine Pandas Columns by Corresponding Dictionary Values

I am looking to quickly combine columns that are genetic complements of each other. I have a large data frame with counts and want to combine columns where the column names are complements. I have a currently have a system that

However, this is slow (checking every column name) and gives different column names based on the ordering of the columns (i.e. deletes different compliment columns between runs). I was wondering if there was a way to incorporate a dictionary key:value pair to speed the process and keep the output consistent. I have an example dataframe below with the desired result (ATTG|TAAC & CGGG|GCCC are compliments).

df = pd.DataFrame({"ATTG": [3, 6, 0, 1],"CGGG" : [0, 2, 1, 4], 
"TAAC": [0, 1, 0, 1], "GCCC" : [4, 2, 0, 0], "TTTT": [2, 1, 0, 1]}) 

## Current Pseudocode
for item in df.columns():
    if compliment(item) in df.columns():
        df[item] = df[item] + df[compliment(item)]
        del df[compliment(item)]

## Desired Result
df_result = pd.DataFrame({"ATTG": [3, 7, 0, 2],"CGGG" : [4, 4, 1, 4], "TTTT": [2, 1, 0, 1]}) 

Upvotes: 1

Views: 526

Answers (1)

ALollz
ALollz

Reputation: 59549

Translate the columns, then assign the columns the translation or original that is sorted first. This allows you to group compliments.

import numpy as np

mytrans = str.maketrans('ATCG', 'TAGC')
df.columns = np.sort([df.columns, [x.translate(mytrans) for x in df.columns]], axis=0)[0, :]

df.groupby(level=0, axis=1).sum()
#   AAAA  ATTG  CGGG
#0     2     3     4
#1     1     7     4
#2     0     0     1
#3     1     2     4

Upvotes: 1

Related Questions