Blackdynomite
Blackdynomite

Reputation: 431

Value Count String Occurrences for Pandas Column of Lists type in Python

I have a pandas column that contains a list of strings that are separated by a comma and a new line " \n " if the list has multiple strings. Otherwise, the notation is simply: [\n "string" \n] (notice how each new string has a \n proceeding it)

Is it possible, for the entire column, count the number of times each string occurs?

     Outcomes
0   [\n "springs"\n]
1   [\n "to_do"\n]
2   [\n "replace"\n]
3   [\n "null"\n]
4   [\n "finance"\n]
5   [\n "finance"\n]
6   [\n "project_management" ,\n "sprints...
7   [\n "to_do" ,\n "finance...
8   [\n "remote"\n]
9   [\n "get_it_done"\n]
10  [\n "get_it_done" ,\n "remote...

Target output should be like the following:

Outcomes      Value_count
springs            21
to_do              12
replace            2
null               1
finance            24
project_management 12
get_it_done        22

Tried to do something like the following but getting an error due to the object type not being iterable

pd.Series([x for item in df['Outcomes'] for x in item]).value_counts()

Upvotes: 0

Views: 1276

Answers (1)

jezrael
jezrael

Reputation: 862581

Use Series.str.split with Series.str.split and Series.str.strip first:

s = df['Outcomes'].str.split(',').explode().str.strip('[] ').value_counts()

Or convert values to lists by ast.literal_eval:

import ast
pd.Series([x.strip() for item in df['Outcomes'] for x in ast.literal_eval(item)]).value_counts()

Upvotes: 1

Related Questions