Reputation: 3
Data:
import pandas as pd
dict= {'REF': ['A','B','C','D'],
'ALT': [['E','F'], ['G'], ['H','I','J'], ['K,L']],
'sample1': ['0', '0', '1', '2'],
'sample2': ['1', '0', '3', '0']
}
df = pd.DataFrame(dict)
Problem:
I need to replace the values in columns'Sample1' and 'Sample2'. If there is 0, then 'REF' column value should be placed. If 1, then first element of list in column 'ALT' should be placed, if 2, then second element of 'ALT' column list, and so on.
My Solution:
sample_list = ['sample1', 'sample2']
for sample in sample_list:
#replace 0s
df[sample] = df.apply(lambda x: x[sample].replace('0', x['REF']), axis=1)
#replace other numbers
for i in range(1,4):
try:
df[sample] = df.apply(lambda x: x[sample].replace(f'{i}', x['ALT'][i-1]), axis=1)
except:
pass
However, because list length is different in every 'ALT' column row, it seems that there is IndexError, and values are not replaced after 1. You can see it from the output:
'{"REF":{"0":"A","1":"B","2":"C","3":"D"},"ALT":{"0":["E","F"],"1":["G"],"2":["H","I","J"],"3":["K"]},"sample1":{"0":"A","1":"B","2":"H","3":"2"},"sample2":{"0":"E","1":"B","2":"3","3":"D"}}'
How can I solve it?
UPDATE: If I have NaN value in sample1 or sample2, I can't convert values to int and don't how to skip these values
So, NaN values should not be converted and stayed NaN
Expected output:
Upvotes: 0
Views: 496
Reputation: 2118
Using a simple concatenation of REF and ALT columns and apply :
import pandas as pd
d= {'REF': ['A','B','C','D'],
'ALT': [['E','F'], ['G'], ['H','I','J'], ['K','L']],
'sample1': ['0', '0', '1', '2'],
'sample2': ['1', '0', '3', '0']
}
df = pd.DataFrame(d)
df["REF_ALT"] = df["REF"].map(list)+df["ALT"] # concatenate REF and ALT
df["sample1"] = df.apply(lambda row: np.nan if np.isnan(row["sample1"]) else row["REF_ALT"][int(row["sample1"])], axis=1)
df["sample2"] = df.apply(lambda row: np.nan if np.isnan(row["sample2"]) else row["REF_ALT"][int(row["sample2"])], axis=1)
df.pop("REF_ALT")
df
Upvotes: 0
Reputation: 11181
A simple solution:
df = pd.DataFrame.from_dict({
'REF': {0: 'A', 1: 'B', 2: 'C', 3: 'D'},
'ALT': {0: ['E', 'F'], 1: ['G'], 2: ['H', 'I', 'J'], 3: ['K', 'L']},
'sample1': {0: 0, 1: 0, 2: 1, 3: 2},
'sample2': {0: 1, 1: 0, 2: 3, 3: 0},
})
# create a temp col s that includes a single string with letters:
df["s"] = df.REF + df.ALT.str.join("")
df["sample1"] = df.apply(lambda x: x["s"][x.sample1], axis=1)
df["sample2"] = df.apply(lambda x: x["s"][x.sample2], axis=1)
df = df.drop(columns="s")
output:
REF ALT sample1 sample2
0 A [E, F] A E
1 B [G] B B
2 C [H, I, J] H J
3 D [K, L] L D
Upvotes: 0
Reputation: 61930
You could do:
df['sample1'] = np.where(df['sample1'].eq(0), df['REF'],
[v[max(i - 1, 0)] for v, i in zip(df['ALT'], df['sample1'].astype(int))])
df['sample2'] = np.where(df['sample2'].eq(0), df['REF'],
[v[max(i - 1, 0)] for v, i in zip(df['ALT'], df['sample2'].astype(int))])
print(df)
Output
REF ALT sample1 sample2
0 A [E, F] E E
1 B [G] G G
2 C [H, I, J] H J
3 D [K] K K
Note that I use a different input given the one in your example is not valid.
Upvotes: 1