jamo
jamo

Reputation: 91

Replace strings with a subset of it

I have a data frame like below:

s1 AA AG AG GG AA
s2 GTTGTT GTTGTT GTTGTT GTTGTT GTTGTT
S3 TT CC TC TT TC
S3 AGTTAGTT AGTTAGTT AGTTAGTT AGTTAGTT AGTTAGTT
S3 GCGCGCGC GCGCGCGC GCGCGCGC GCGCGCGC GCGCGCGC

and I want to find every string in the dataframe which has more than two characters (like GTTGTT) , and divide the string in two parts (all the string are even) (GTT GTT) and then get the first character from each part (GG). so my dataframe will be like this:

s1 AA AG AG GG AA
s2 GG GG GG GG GG
S3 TT CC TC TT TC
S3 AA AA AA AA AA
S3 GG GG GG GG GG

Any suggestions is appreciated. Thank you in advance

Upvotes: 0

Views: 34

Answers (1)

Henry Yik
Henry Yik

Reputation: 22503

One way is to use applymap:

df = pd.DataFrame({'num': {0: 's1', 1: 's2', 2: 'S3', 3: 'S3', 4: 'S3'}, 
                   'A': {0: 'AA', 1: 'GTTGTT', 2: 'TT', 3: 'AGTTAGTT', 4: 'GCGCGCGC'}, 
                   'B': {0: 'AG', 1: 'GTTGTT', 2: 'CC', 3: 'AGTTAGTT', 4: 'GCGCGCGC'}, 
                   'C': {0: 'AG', 1: 'GTTGTT', 2: 'TC', 3: 'AGTTAGTT', 4: 'GCGCGCGC'}, 
                   'D': {0: 'GG', 1: 'GTTGTT', 2: 'TT', 3: 'AGTTAGTT', 4: 'GCGCGCGC'}, 
                   'E': {0: 'AA', 1: 'GTTGTT', 2: 'TC', 3: 'AGTTAGTT', 4: 'GCGCGCGC'}})

df.iloc[:,1:6] = df.iloc[:,1:6].applymap(lambda x: x[0]+x[len(x)//2])

print (df)

#
  num   A   B   C   D   E
0  s1  AA  AG  AG  GG  AA
1  s2  GG  GG  GG  GG  GG
2  S3  TT  CC  TC  TT  TC
3  S3  AA  AA  AA  AA  AA
4  S3  GG  GG  GG  GG  GG

Upvotes: 2

Related Questions