Rstudyer
Rstudyer

Reputation: 477

Convert data dictionary from word to excel with R

I got the data dictionary from data provider which contains hundreds vars in different word files and looks like this: enter image description here

In order to add this dictionary to my current dataset, I need to convert it to certain format in Excel. For example,for first var:"intarm_actual", i would like to create columns in a spreadsheet: col of "variable" puts the left top words, col of "label" store content of "label" (for this var, it is NA, but for second var, it should be "tpe_lab"), col of "type" stors the words of " string(str2), col of "value" stores "4", col of "missing" stores "46/102", col of "tabulation" stores "46 "", 14 "RO",14 "RV",14 "TO",14 "TV"". Ideally, it should look like this: enter image description here

Could anyone who happens have done this before help to provide some suggestions for this? (I appreciate for any suggestion like what package I should refer and use, any related posts article I should read, similar type of code i can learn...)Can R package "labelled" handle this type of task? Thanks a lot~~!!

update:_________________________________________________

I use package qdapTool to imported one of the docx files, it looks like this: enter image description here

How can I retrieve the demanded words and assign them to right place in my spreadsheet? Thanks~~!

Update 2:--------------------------------------------
Issue has been solved in another way.

In case someone will encounter the similar situation, 1) This type of codebook file is generated by STATA; 2) Instead of reading this complex text file, the alternative solution is using package of "codebook" in R to generate the new .csv codebook which contains both these information and even more.

Upvotes: 1

Views: 211

Answers (1)

LeaK
LeaK

Reputation: 31

assuming that indeed, you have zero clue, I would recommend you to get started with regular expressions in R. I often use the R package stringr to work with regular expressions, and you find the respective cheat sheet here. They will allow you to, e.g., select the word following a ":".

I have never worked with Word Documents in R, but I guess that there are packages out there that allow you to read Word documents into R. Just Google them. :) I am sure they also have good instructions on how to use them.

Another issue you might encounter is encoding. If you have issues with reading the text into read in the correct way, e.g. reading in strange character combinations, that is most likely the source of the problem.

Once you have looked at these things and started working on your own code, you will be able to ask more precise questions.

Upvotes: 1

Related Questions