How to convert untidy survey tables to dataframes in R

Question

I often work with survey data at my job that comes in horribly formatted excel files that have been designed for readability and not for any data analysis. I'm looking for a way to clean the data in R and knock it into a dataframe format of variables and observations.

I know there's a load of tutorials on data cleaning in R but from my experience, they mostly deal with data that is already in a machine-readable format, so any help with this would be appreciated!

Here's a dummy example of the raw survey that has this shape:

Are you male or female?

           Variable1 Variable2 Variable3 Variable4
Male       n%        n%        n%        n%
Female     n%        n%        n%        n%


How old are you?

           Variable1 Variable2 Variable3 Variable4
18-34      n%        n%        n%        n%
35+        n%        n%        n%        n%

And so on, with blank space being empty cells/rows, the whole of each survey question being in column A a couple rows above it's corresponding data table and all questions/datatables being on one worksheet.

Is there a way to convert to this with R code?

Question                Response Variable1 Variable2 Variable3 Variable4
Are you male or female? Male     n%        n%        n%        n%
Are you male or female? Female   n%        n%        n%        n%
How old are you?        18-34    n%        n%        n%        n%
How old are you?        35+      n%        n%        n%        n%

At the moment, I'm using some VBA code to do this in excel then reading in to R for further analysis/visualisation, but it would be nice be able to skip the excel phase and go straight to R.

Thanks!

How to convert untidy survey tables to dataframes in R

Answers (1)

Related Questions