Morena
Morena

Reputation: 33

Python remove outer quotes in a list of lists made from a data frame column

I have a pandas data frame called positive_samples that has a column called Gene Class, which is basically a pair of genes stored as a list. It looks like below

enter image description here

The entire data frame looks like this.

enter image description here.

So the gene class column is just the other two columns in the data frame combined. I made a list using the gene class column like below. This take all the gene pair lists and make them into a single list.

   #convert the column to a list
   postive_gene_pairs = positive_samples["Gene Class"].tolist()

This is the output.

enter image description here

Each pair is now wrapped within double quotes, which I dont want because I loop through this list and use .loc method to locate this pairs in another data frame called new_expression which has them as an index like this

enter image description here

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])

This throws a keyerror.

enter image description here

And it definely because of the extra quotes that each pair is wrapped around because when I instantiate a list like below without quotes it works just fine.

enter image description here

So my question is how do I remove the extra quotes to make this work with .loc? To make a list just like below, but from a data frame column?.

pairs = [['YAL013W','YBR103W'],['YAL011W','YMR263W']]

I tried so many workarounds like replace, strip but none of them worked for me as ideally they would work for strings but I was trying to make them work on a list, any easy solution? I just want to have a list of list like this pairs list that does not have extra single or double quotes.

Upvotes: 1

Views: 2145

Answers (2)

jezrael
jezrael

Reputation: 863541

Convert list of strings to lists first:

import ast

postive_gene_pairs = positive_samples["Gene Class"].apply(ast.literal_eval).tolist()

And then remove []:

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])

to:

for positive_gene_pair in positive_gene_pairs:
    print(new_expression_df.loc[positive_gene_pair,"GSM144819"])

Upvotes: 0

Divyessh
Divyessh

Reputation: 2721

define a functio:

def listup(initlist):
    # Converting string to list 
    res = ini_list.strip('][').split(', ') 
    
    return res

change from

postive_gene_pairs = positive_samples["Gene Class"].tolist()

to

postive_gene_pairs = positive_samples["Gene Class"].apply(listup).tolist()

Upvotes: 0

Related Questions