Reputation: 33
I have a pandas data frame called positive_samples that has a column called Gene Class, which is basically a pair of genes stored as a list. It looks like below
The entire data frame looks like this.
So the gene class column is just the other two columns in the data frame combined. I made a list using the gene class column like below. This take all the gene pair lists and make them into a single list.
#convert the column to a list
postive_gene_pairs = positive_samples["Gene Class"].tolist()
This is the output.
Each pair is now wrapped within double quotes, which I dont want because I loop through this list and use .loc method to locate this pairs in another data frame called new_expression which has them as an index like this
for positive_gene_pair in positive_gene_pairs:
print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])
This throws a keyerror.
And it definely because of the extra quotes that each pair is wrapped around because when I instantiate a list like below without quotes it works just fine.
So my question is how do I remove the extra quotes to make this work with .loc? To make a list just like below, but from a data frame column?.
pairs = [['YAL013W','YBR103W'],['YAL011W','YMR263W']]
I tried so many workarounds like replace, strip but none of them worked for me as ideally they would work for strings but I was trying to make them work on a list, any easy solution? I just want to have a list of list like this pairs list that does not have extra single or double quotes.
Upvotes: 1
Views: 2145
Reputation: 863541
Convert list of strings to lists first:
import ast
postive_gene_pairs = positive_samples["Gene Class"].apply(ast.literal_eval).tolist()
And then remove []
:
for positive_gene_pair in positive_gene_pairs:
print(new_expression_df.loc[[positive_gene_pair],"GSM144819"])
to:
for positive_gene_pair in positive_gene_pairs:
print(new_expression_df.loc[positive_gene_pair,"GSM144819"])
Upvotes: 0
Reputation: 2721
define a functio:
def listup(initlist):
# Converting string to list
res = ini_list.strip('][').split(', ')
return res
change from
postive_gene_pairs = positive_samples["Gene Class"].tolist()
to
postive_gene_pairs = positive_samples["Gene Class"].apply(listup).tolist()
Upvotes: 0