Tom_Hanks
Tom_Hanks

Reputation: 527

summarize overlap with python

I am analyzing DNA/Protein sequence data with python and got a problem. Here is the table of DNA sequence.

enter image description here

I want to analyze them as group1 and group2 are pair. For example, AAATTT_TTTCCC or GGGCCC_GGAAA are pairs.

This sequence data sometimes shows same sequence. For instance, AAATTT appeared three times and AGTC did twice. I want to count this overlap sequence and summarize as below. I wonder I should use pandas, but don't know how to do this. If anyone could help this, I would be grateful with that very much.

enter image description here

Upvotes: 2

Views: 363

Answers (1)

sundance
sundance

Reputation: 2945

To count the number of appearances of each unique value in a column:

# import pandas
import pandas as pd

# load data into Pandas dataframe
df = pd.read_csv("data.csv")

# get counts for each unique Group1 value
df["Group1"].value_counts()

Upvotes: 1

Related Questions