Methexis
Methexis

Reputation: 2861

Isolate part of a string in R

Below is an example of a string I have ingested into R:

General\\Contingency\\Import\\Import_Manual\\New\\ADC170001A13_Loc.txt

I am trying to isolate the 'ADC170001A13' I have tried substring and also a gsub to remove everything apart from that part of the string but I get the below error:

Error in gsub(clean, "", TextLOCfiles) : 
invalid regular expression '\\Fs01 \DepartmentFolders$\General\Contingency\Import\Import_Manual\New\', reason 'Trailing   backslash'
In addition: Warning message:
In gsub(clean, "", TextLOCfiles) :
argument 'pattern' has length > 1 and only the first element will be used

Upvotes: 2

Views: 245

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269586

Try this:

library( tools )
basename( file_path_sans_ext( TextLOCfiles ) )

or without addon packages:

sub( "\\.[^.]*$", "", basename( TextLOCfiles ) )

These solutions do not require that you know the file name or extension and also work if there is no extension.

Upvotes: 2

damienfrancois
damienfrancois

Reputation: 59110

You can capture the needed part with gsub and parentheses:

> gsub(".*\\\\(\\w+)_.*", "\\1", TextLOCfiles)
[1] "ADC170001A13"

Upvotes: 2

Konrad Rudolph
Konrad Rudolph

Reputation: 545588

The easiest solution is to use regmatches:

> rxmatch = regexpr('(?<=\\\\)\\w+(?=_Loc\\.)', TextLOCfiles, perl = TRUE)
> regmatches(TextLOCfiles , rxmatch)
ADC170001A13

perl = TRUE is required in order to get the zero-width assertions, as mentioned by Simon in the comments.

Upvotes: 2

datawookie
datawookie

Reputation: 6534

This looks like a file path. If this is true then you can simply use basename() as follows:

sub(".txt", "", basename(TextLOCfiles))

Upvotes: 2

Related Questions