martin
martin

Reputation: 1185

How to change value in two DataFrame columns in python

I have a CSV file with 6 cols. I load it to memory and process by some methods. My result is a data frame with 4 cols looks like:

name number Allele Allele
aaa  111     A       B
aab  112     A       A
aac  113     A       B

But now I got csv with another format (no Illumina) and I need to change it to above.

I have a result:

name number Allele1 Allele2
aaa  111     A       C
aab  112     A       G
aac  113     G       G

I know how to change format, for example AG == AB, GG == AA, CC == AA (too) etc. But it the better way to do this than for loop?

Lets say:

for line in range(len(dataframe)):
 if(dataframe.Allele1[line] == A and dataframe.Allele2[line] == G):
    dataframe.Allele1[line] = A
    dataframe.Allele2[line] = B
 elif:
 etc.

I feel that this is not the best method to accomplish this task. Meaby is a better way in pandas or just python?

I need to change thath format to Illumina format because database deal with Illumina.

And: in illumina AA = AA,CC,GG; AB = AC, AG, AT, CT, GT; BB = CG, TT etc.

So if row[1] in col Allele1 is A and in Allele2 is T, edited row will be: Allele1 =  A, Allele2 = B.

The expected result is:

name number Allele1 Allele2
 aaa  111     A       B
 aab  112     A       B
 aac  113     A       A

In result I MUST have a 4 cols.

Upvotes: 0

Views: 64

Answers (2)

Rajat Jain
Rajat Jain

Reputation: 2032

You can try this (to convert AG to AB) :

df.loc[df['Allele1'] == 'A' & df['Allele1'] == 'G', 'Allele1'] = 'A'
df.loc[df['Allele1'] == 'A' & df['Allele1'] == 'G', 'Allele2'] = 'B'

Upvotes: 0

Newbie_2006
Newbie_2006

Reputation: 72

Have you tried using pandas.DataFrame.replace? For instance:

df['Allele1'].replace(['GC', 'CC'], 'AA')

With that line you could replace in the column "Allele1" the values GC and CC for the one you look for, AA. You can apply that logic for all the substitutions you need, and If you desire to do it in the whole dataframe just don't specify the column, do instead something like:

df.replace(['GC', 'CC'], 'AA')

Upvotes: 1

Related Questions