Replace function in Python is not working (other answers did not solve my problem)

Question

I know it is a repeated question, but I tried the answers from other questions and I could not solve the issue.

In summary, I want to replace 0 by 'A A ', 1 by 'A B ', 2 by 'B B ', and 5 by '0 0 '.

My impute datafile (datafile.txt) format is presented below, and I want to replace just information in the column "Geno" (in the true dataset I have a million lines).

Sample	Geno
ID1	11010111151
ID2	12000120022
ID3	12055520022
ID4	12000120022

The pipeline I am using is:

import pandas as pd
#input file
fin = pd.read_table('dataframe.txt',sep = ' ', header=None)
df = pd.DataFrame(fin)
geno = (df.iloc[: , 1:])
id = (df.iloc[: , 0])
geno = pd.DataFrame(geno)
geno2 = geno.replace("0","A A ").replace("1","A B ").replace("2","B B ").replace("5","0 0 ")

I appreciate your help! I was doing it in bash (using awk), but it is taking a long time. I decided to try in Python since I believe would be faster. PS: I am beginner in Python. Thank you again.

Nk03 · Accepted Answer

TRY:

df.Geno = df.Geno.astype(str).str.replace("0","A A ").str.replace("1","A B ").str.replace("2","B B ").str.replace("5","0 0 ")

OUTPUT:

  Sample                                          Geno
0    ID1  A B A B A A A B A A A B A B A B A B 0 0 A B 
1    ID2  A B B B A A A A A A A B B B A A A A B B B B 
2    ID3  A B B B A A 0 0 0 0 0 0 B B A A A A B B B B 
3    ID4  A B B B A A A A A A A B B B A A A A B B B B

Replace function in Python is not working (other answers did not solve my problem)

Answers (2)

Related Questions