Daniyar Karabayev
Daniyar Karabayev

Reputation: 3

Pandas/Python: Replacing column values from another column values using .replace()

Data:

screenshot


import pandas as pd
dict= {'REF': ['A','B','C','D'],
        'ALT': [['E','F'], ['G'], ['H','I','J'], ['K,L']],
        'sample1': ['0', '0', '1', '2'],
        'sample2': ['1', '0', '3', '0']
        }
df = pd.DataFrame(dict)

Problem: I need to replace the values in columns'Sample1' and 'Sample2'. If there is 0, then 'REF' column value should be placed. If 1, then first element of list in column 'ALT' should be placed, if 2, then second element of 'ALT' column list, and so on.
My Solution:

 sample_list = ['sample1', 'sample2']
    for sample in sample_list:

        #replace 0s 
        df[sample] = df.apply(lambda x: x[sample].replace('0', x['REF']), axis=1)
        #replace other numbers
        for i in range(1,4):
            try:
                df[sample] = df.apply(lambda x: x[sample].replace(f'{i}', x['ALT'][i-1]), axis=1)
            except:
                pass

However, because list length is different in every 'ALT' column row, it seems that there is IndexError, and values are not replaced after 1. You can see it from the output:

screenshot

'{"REF":{"0":"A","1":"B","2":"C","3":"D"},"ALT":{"0":["E","F"],"1":["G"],"2":["H","I","J"],"3":["K"]},"sample1":{"0":"A","1":"B","2":"H","3":"2"},"sample2":{"0":"E","1":"B","2":"3","3":"D"}}'

How can I solve it?

UPDATE: If I have NaN value in sample1 or sample2, I can't convert values to int and don't how to skip these values

enter image description here

So, NaN values should not be converted and stayed NaN

Expected output:

enter image description here

Upvotes: 0

Views: 496

Answers (3)

Ismael EL ATIFI
Ismael EL ATIFI

Reputation: 2118

Using a simple concatenation of REF and ALT columns and apply :

import pandas as pd
d= {'REF': ['A','B','C','D'],
        'ALT': [['E','F'], ['G'], ['H','I','J'], ['K','L']],
        'sample1': ['0', '0', '1', '2'],
        'sample2': ['1', '0', '3', '0']
        }
df = pd.DataFrame(d)


df["REF_ALT"] = df["REF"].map(list)+df["ALT"]  # concatenate REF and ALT
df["sample1"] = df.apply(lambda row: np.nan if np.isnan(row["sample1"]) else row["REF_ALT"][int(row["sample1"])], axis=1)
df["sample2"] = df.apply(lambda row: np.nan if np.isnan(row["sample2"]) else row["REF_ALT"][int(row["sample2"])], axis=1)
df.pop("REF_ALT")
df

enter image description here

Upvotes: 0

anon01
anon01

Reputation: 11181

A simple solution:

df = pd.DataFrame.from_dict({
 'REF': {0: 'A', 1: 'B', 2: 'C', 3: 'D'},
 'ALT': {0: ['E', 'F'], 1: ['G'], 2: ['H', 'I', 'J'], 3: ['K', 'L']},
 'sample1': {0: 0, 1: 0, 2: 1, 3: 2},
 'sample2': {0: 1, 1: 0, 2: 3, 3: 0},
})

# create a temp col s that includes a single string with letters:
df["s"] = df.REF + df.ALT.str.join("")    
df["sample1"] = df.apply(lambda x: x["s"][x.sample1], axis=1)
df["sample2"] = df.apply(lambda x: x["s"][x.sample2], axis=1)
df = df.drop(columns="s")

output:

  REF        ALT sample1 sample2
0   A     [E, F]       A       E
1   B        [G]       B       B
2   C  [H, I, J]       H       J
3   D     [K, L]       L       D

Upvotes: 0

Dani Mesejo
Dani Mesejo

Reputation: 61930

You could do:

df['sample1'] = np.where(df['sample1'].eq(0), df['REF'],
                         [v[max(i - 1, 0)] for v, i in zip(df['ALT'], df['sample1'].astype(int))])

df['sample2'] = np.where(df['sample2'].eq(0), df['REF'],
                         [v[max(i - 1, 0)] for v, i in zip(df['ALT'], df['sample2'].astype(int))])

print(df)

Output

  REF        ALT sample1 sample2
0   A     [E, F]       E       E
1   B        [G]       G       G
2   C  [H, I, J]       H       J
3   D        [K]       K       K

Note that I use a different input given the one in your example is not valid.

Upvotes: 1

Related Questions