Reputation: 57
I have a DataFrame, where I am counting the elements in two columns within and across different rows. Each of the rows in each of the columns has many different elements.
Example of my DataFrame:
col_1 col_2
0 [AbC, DE, FG] ['abc', 'de', 'fg']
1 [IJ] ['ij']
2 [DE, IJ] ['de', 'ij']
3 [] []
4 [AbC, de] ['abc', 'de']
This DataFrame is a very simple representation, and the difference between the two columns in my df is that the second column has been created by converting all the elements in the 1st column to lowercase. That is, by applying the following command:
df['col_1'].astype(str).str.lower()
As you might notice, the elements in col_2 have '' around each of the elements whereas col_1 elements are without it. I tried making a sample DataFrame to replicate here, but could not get one without the '' as in col_1.
I applied the following command to count the individual elements:
Counter(df['col_1'].explode())
It works for col_1, but not for col_2.
Expected frequency output that I want to get:
abc: 2
de: 3
fg: 1
ij: 3
The output can be a list or a dictionary, as long as it can be used for further analysis. I hope that the Dataframe is reproducible. Any help or suggestions would be highly appreciated.
Upvotes: 1
Views: 1068
Reputation: 21
If col_2 is needed then you can use the following:
import pandas as pd
from collections import Counter
data = {
'col_1': [
['AbC','DE','FG'],
['IJ'],
['DE','IJ'],
[],
['AbC', 'de']
],
'col_2': [
['abc','de','fg'],
['ij'],
['de','ij'],
[],
['abc', 'de']
]
}
data_frame = pd.DataFrame(data)
# Flatten all sub-lists into a master list
master_list = [element for sublist in data_frame["col_2"].tolist() for element in sublist]
# Use counter to group and count
print(Counter(master_list))
Output:
Counter({'de': 3, 'abc': 2, 'ij': 2, 'fg': 1})
If col_2 isn't needed then you can perform the .lower() as you iterate through items when creating the flattened list.
import pandas as pd
from collections import Counter
data = {
'col_1': [
['AbC','DE','FG'],
['IJ'],
['DE','IJ'],
[],
['AbC', 'de']
]
}
data_frame = pd.DataFrame(data)
# Flatten all sub-lists into a master list
master_list = [element.lower() for sublist in data_frame["col_1"].tolist() for element in sublist]
# Use counter to group and count
print(Counter(master_list))
Output:
Counter({'de': 3, 'abc': 2, 'ij': 2, 'fg': 1})
Upvotes: 2