Amanda Wishnie
Amanda Wishnie

Reputation: 11

Cleanse survey data and sum the responses to be in a python dataframe

Columns of my dataframe are the survey questions and the rows are the responses. The unique response choices were (1 - Strong Disagree, 2 - Disagree, 3 - Neutral, 4- Agree, 5- Strongly Agree). The rows have all the respondents selections and I ideally would like the columns to be the unique response choices with a sum of how many people selected that response per question and the rows to be each question.

Not sure how to get that - any suggestions?

My original data My original data

Tried to transpose which successfully made the questions rows but now I have 100s of unique "rows" for each response Tried to transpose which successfully made the questions rows but now I have 100s of unique "rows" for each response

Ultimate goal would be to group-by the questions and have the response choices under each question with the columns as a sum for each response Ultimate goal would be to group-by the questions and have the response choices under each question with the columns as a sum for each response

Upvotes: 1

Views: 716

Answers (2)

Amanda Wishnie
Amanda Wishnie

Reputation: 11

Here is the final result with your code and the group-by! final results!

Upvotes: 0

the_good_pony
the_good_pony

Reputation: 500

This is a bit quick and dirty but it might help

EDIT Updated to transform the tally into a pandas data frame

Setup example dataframe

df = pd.DataFrame ({ 'question_1' : ['1 - strongly agree','1 - strongly agree','2 - agree'], 
                     'question_2' : ['3 - neutral','2 - agree','2 - agree'],
                     'question_3' : ['1 - strongly agree','2 - agree','3 - neutral'],
                     'question_4' : ['4 - disagree','5 - strongly disgree','5 - strongly disgree'],
                     'question_5' : ['3 - neutral','2 - agree','2 - agree']} )

Get the value_counts() for each column

ls_flat = []
for col in  df.columns:
    for index in df[col].value_counts().index:
        print(col,index, df[col].value_counts()[index])
        ls_flat.append([col,index, df[col].value_counts()[index]])

Make this list into a dataframe

df_flat = pd.DataFrame(ls_flat)

Rename columns to something more meaningful

df_flat.columns = ['question', 'response', 'tally']

This will create a dataframe which looks like this

enter image description here

Upvotes: 1

Related Questions