Finding and retrieving rows for which values of two columns are different

Question

I have a dataframe (~36K rows, 3 columns) part of which is shown below.

	hgnc_symbol	ensembl_gene_id	external_synonym
1	A1CF	ENSG00000148584	ACF
2	A1CF	ENSG00000148584	ACF64
3	A1CF	ENSG00000148584	ACF65
4	A1CF	ENSG00000148584	APOBEC1CF
5	A1CF	ENSG00000148584	ASP
6	A2M	ENSG00000175899	CPAMD5
7	A2M	ENSG00000175899	FWP007
8	A2M	ENSG00000175899	S863-7
9	A2ML1	ENSG00000166535	CPAMD9
10	A2ML1	ENSG00000166535	FLJ25179
11	A2ML1	ENSG00000166535	p170
12	AADAC	ENSG00000114771	CES5A1
13	AADAC	ENSG00000114771	DAC
14	AADACL2	ENSG00000261846	MGC72001
15	AADACL2	ENSG00000197953	MGC72001

I would like to find rows for which the hgnc_symbol column values are the same but the ensembl_gene_id column value is different. I would like to retrieve such entries. The result in this case should look something like this:

	hgnc_symbol	ensembl_gene_id	external_synonym
14	AADACL2	ENSG00000261846	MGC72001
15	AADACL2	ENSG00000197953	MGC72001

I would really appreciate any solutions to this.

Finding and retrieving rows for which values of two columns are different

Answers (1)

Related Questions