tsouchlarakis
tsouchlarakis

Reputation: 1619

Regex to Remove Everything but Numbers, Letters and Spaces in R

How can I remove these pesky backslashes in R? I've scoured the web and stackoverflow to try to find a way to get rid of backslashes...no luck.

I've tried a lot of different ways, but I think the only one that I can get working will be to remove every character that is not a number, letter or space using regular expressions and gsub(). Here is my string:

"_kMDItemOwnerUserID = 99kMDItemAlternateNames = ( \"(500) Days of Summer     (2009).m4v\")kMDItemAudioBitRate = 163kMDItemAudioChannelCount =     2kMDItemAudioEncodingApplication = \"HandBrake 0.9.4 2009112300\"kMDItemCodecs =     ( \"H.264\", AAC, \"QuickTime Text\")"

As you can see it is very messy, with backslashes and quotation marks all over the place. Ultimately, what I want to do is extract the movie name: '(500) Days of Summer (2009)'.

What is a regular expression that will match everything but numbers, letters and spaces?

Thank you very much in advance for your help.

Upvotes: 1

Views: 4321

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520968

gsub("[^[:alnum:] ]", "", x)

Try replacing the character class [^[:alnum:] ], which will match any character which is not a letter, number, or space:

Full code:

x <- "_kMDItemOwnerUserID = 99kMDItemAlternateNames = ( \"(500) Days of Summer     (2009).m4v\")kMDItemAudioBitRate = 163kMDItemAudioChannelCount =     2kMDItemAudioEncodingApplication = \"HandBrake 0.9.4 2009112300\"kMDItemCodecs =     ( \"H.264\", AAC, \"QuickTime Text\")"

gsub("[^[:alnum:] ]", "", x)
[1] "kMDItemOwnerUserID  99kMDItemAlternateNames   500 Days of Summer     2009m4vkMDItemAudioBitRate  163kMDItemAudioChannelCount      2kMDItemAudioEncodingApplication  HandBrake 094 2009112300kMDItemCodecs       H264 AAC QuickTime Text"

Upvotes: 5

Related Questions