Reputation: 90
Using python, I would like to count the occurrence of a lists elements for each row in a dataframe, and aggregate each elements occurrence.
Here is the dataframe I am working with:
#Cluster_number_1 Cluster Type: terpene
#Cluster_number_2 Cluster Type: nrps
#Cluster_number_3 Cluster Type: terpene
#Cluster_number_4 Cluster Type: nrps
#Cluster_number_5 Cluster Type: nrps
#Cluster_number_6 Cluster Type: nrps
#Cluster_number_7 Cluster Type: t1pks
#Cluster_number_8 Cluster Type: other
#Cluster_number_9 Cluster Type: t1pks
#Cluster_number_10 Cluster Type: nrps
The corresponding list:
cluster_type = ["t1pks", "nrps", "terpene", "other"]
Desired output:
BGC_Class Count
t1pks 2
nrps 5
terpene 2
other 1
To help explain, borrowing from unix $ variables:
file = "cluster_counts.txt"
cluster_count = open(file, "w")
cluster_count.write(+$1+"\t"+$2"\n")
Where $1 is the first element in the list, and $2 is the number of times it occurs, across all rows.
The dataframes won't exceed 100 lines, so efficiency is no issue.
Best, B.D.
I found something to get me started here How to count the occurrences of a list item?.
>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
However this only counts the occurrences of elements within the list containing it.
I don't know how to count the occurrence of my lists elements in the dataframe.
Upvotes: 0
Views: 5249
Reputation: 90
Creating the appropriate header over the corresponding column did the trick:
import pandas as pd
df = pd.read_csv('test2_output copy.tsv', sep='\t', names=['Cluster Number', '#', 'Cluster_Type'])
df.Cluster_Type.value_counts()
Output:
t1pks 7
nrps 7
other 3
terpene 2
t1pks-nrps 1
indole 1
Thanks, 'The Unfun Cat'
Upvotes: 1
Reputation: 31908
Try
df.BGC_Class.value_counts()
If this does not work, please post your data :)
Upvotes: 1