Python learner
Python learner

Reputation: 23

Replace function in Python is not working (other answers did not solve my problem)

I know it is a repeated question, but I tried the answers from other questions and I could not solve the issue.

In summary, I want to replace 0 by 'A A ', 1 by 'A B ', 2 by 'B B ', and 5 by '0 0 '.

My impute datafile (datafile.txt) format is presented below, and I want to replace just information in the column "Geno" (in the true dataset I have a million lines).

Sample Geno
ID1 11010111151
ID2 12000120022
ID3 12055520022
ID4 12000120022

The pipeline I am using is:

import pandas as pd
#input file
fin = pd.read_table('dataframe.txt',sep = ' ', header=None)
df = pd.DataFrame(fin)
geno = (df.iloc[: , 1:])
id = (df.iloc[: , 0])
geno = pd.DataFrame(geno)
geno2 = geno.replace("0","A A ").replace("1","A B ").replace("2","B B ").replace("5","0 0 ")

I appreciate your help! I was doing it in bash (using awk), but it is taking a long time. I decided to try in Python since I believe would be faster. PS: I am beginner in Python. Thank you again.

Upvotes: 2

Views: 57

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35686

Series Replace with a dict is also an option:

import pandas as pd

df = pd.DataFrame({
    'Sample': ['ID1', 'ID2', 'ID3', 'ID4'],
    'Geno': [11010111151, 12000120022, 12055520022, 12000120022]
})

df['Geno'] = df['Geno'].astype(str).replace({
    '0': ' A A',
    '1': ' A B',
    '2': ' B B',
    '5': ' 0 0'
}, regex=True).str.lstrip()

print(df)

df:

  Sample                                          Geno
0    ID1   A B A B A A A B A A A B A B A B A B 0 0 A B
1    ID2   A B B B A A A A A A A B B B A A A A B B B B
2    ID3   A B B B A A 0 0 0 0 0 0 B B A A A A B B B B
3    ID4   A B B B A A A A A A A B B B A A A A B B B B

Upvotes: 2

Nk03
Nk03

Reputation: 14949

TRY:

df.Geno = df.Geno.astype(str).str.replace("0","A A ").str.replace("1","A B ").str.replace("2","B B ").str.replace("5","0 0 ")

OUTPUT:

  Sample                                          Geno
0    ID1  A B A B A A A B A A A B A B A B A B 0 0 A B 
1    ID2  A B B B A A A A A A A B B B A A A A B B B B 
2    ID3  A B B B A A 0 0 0 0 0 0 B B A A A A B B B B 
3    ID4  A B B B A A A A A A A B B B A A A A B B B B 

Upvotes: 3

Related Questions