Reputation: 431
I have a pandas column that contains a list of strings that are separated by a comma and a new line " \n " if the list has multiple strings. Otherwise, the notation is simply: [\n "string" \n] (notice how each new string has a \n proceeding it)
Is it possible, for the entire column, count the number of times each string occurs?
Outcomes
0 [\n "springs"\n]
1 [\n "to_do"\n]
2 [\n "replace"\n]
3 [\n "null"\n]
4 [\n "finance"\n]
5 [\n "finance"\n]
6 [\n "project_management" ,\n "sprints...
7 [\n "to_do" ,\n "finance...
8 [\n "remote"\n]
9 [\n "get_it_done"\n]
10 [\n "get_it_done" ,\n "remote...
Target output should be like the following:
Outcomes Value_count
springs 21
to_do 12
replace 2
null 1
finance 24
project_management 12
get_it_done 22
Tried to do something like the following but getting an error due to the object type not being iterable
pd.Series([x for item in df['Outcomes'] for x in item]).value_counts()
Upvotes: 0
Views: 1276
Reputation: 862581
Use Series.str.split
with Series.str.split
and Series.str.strip
first:
s = df['Outcomes'].str.split(',').explode().str.strip('[] ').value_counts()
Or convert values to lists by ast.literal_eval
:
import ast
pd.Series([x.strip() for item in df['Outcomes'] for x in ast.literal_eval(item)]).value_counts()
Upvotes: 1