Reputation: 247
I have a dataframe in pandas which has five columns: contig, length, identity, percent and hit. This data is parsed from a BLAST output and sorted by contig length and percent match. My goal is to have output writing only a line for each unique contig. An example of the output:
contig length identity percent hit
contig-100_0 5485 [1341/1341] [100.%] ['hit1']
contig-100_0 5485 [5445/5445] [100.%] ['hit2']
contig-100_0 5485 [59/59] [100.%] ['hit3']
contig-100_1 2865 [2865/2865] [100.%] ['hit1']
contig-100_2 2800 [2472/2746] [90.0%] ['hit1']
contig-100_3 2417 [2332/2342] [99.5%] ['hit1']
contig-100_4 2204 [2107/2107] [100.%] ['hit1']
contig-100_4 2000 [1935/1959] [98.7%] ['hit2']
I would want the above to look like this:
contig length identity percent hit
contig-100_0 5485 [1341/1341] [100.%] ['hit1']
contig-100_1 2865 [2865/2865] [100.%] ['hit1']
contig-100_2 2800 [2472/2746] [90.0%] ['hit1']
contig-100_3 2417 [2332/2342] [99.5%] ['hit1']
contig-100_4 2204 [2107/2107] [100.%] ['hit1']
Here is the code I use to produce the output above:
df = pd.read_csv(path+i,sep='\t', header=None, engine='python', \
names=['contig','length','identity','percent','hit'])
df = df.sort_values(['length', 'percent'], ascending=[False, False])
top_hits = df.to_string(justify='left',index=False)
with open ('sorted_contigs', 'a') as sortedfile:
sortedfile.write(top_hits+"\n")
I know about the unique() method in pandas and think the syntax I need to use is df.contig.unique()
but I am not sure where in the code I would place it. I am still learning pandas so any help is appreciated! Thank you.
Upvotes: 0
Views: 36
Reputation: 519
You may do it with DataFrame.groupby(<colname>).head(<num_of_rows>)
:
df.groupby('contig').head(1)
And the output:
contig length identity percent hit
0 contig-100_0 5485 [1341/1341] [100.%] ['hit1']
3 contig-100_1 2865 [2865/2865] [100.%] ['hit1']
4 contig-100_2 2800 [2472/2746] [90.0%] ['hit1']
5 contig-100_3 2417 [2332/2342] [99.5%] ['hit1']
6 contig-100_4 2204 [2107/2107] [100.%] ['hit1']
Upvotes: 3