Reputation: 7255
In the following pandas dataframe:
d1 = pd.read_csv('to_count.mcve.txt', sep='\t')
d1 = d1.set_index(['pos'], append=True)
M1 M2 F1 F2
pos
0 23 A,B,A,C,D A,C,B A D
1 24 A,B,B,C,B A,B,A B B
2 28 C,B,C,D,E B,C E C
I used the below code to do some counting:
hapX_count = pd.DataFrame()
hapY_count = pd.DataFrame()
for index, lines in d1.iterrows():
hap_x = lines['F1']
hap_y = lines['F2']
x_count = lines.apply(lambda x: x.count(hap_x)/2 if len(x) > 5 else x.count(hap_x))
y_count = lines.apply(lambda x: x.count(hap_y)/2 if len(x) > 5 else x.count(hap_y))
hapX_count = hapX_count.append(x_count)
hapY_count = hapY_count.append(y_count)
print(hapX_count)
Output is:
F1 F2 M1 M2
(0, 23) 1.0 0.0 1.0 1.0
(1, 24) 1.0 1.0 1.5 1.0
(2, 28) 1.0 0.0 0.5 0.0
How can I get the index value (pos
) back as they were in the previous data? I can use the index to call the position of those tuple. But, I want to automate the process so all the indexes are retained, because there will be more than one index (not just pos
) in my original data.
Thanks,
Upvotes: 0
Views: 36
Reputation: 1846
You can replace the two lines above your for loop with the below lines. This will create empty DataFrames with the index having the same names as the index of d1
.
hapX_count = pd.DataFrame(index=d1.index[0:0])
hapY_count = pd.DataFrame(index=d1.index[0:0])
Upvotes: 1