pinegulf
pinegulf

Reputation: 1396

Importing multi space delimited file

Good day,

I'm trying to import data from text file to R. Properly formatted data is no issue, but what to do when delimiter is double/triple space?

For example I have following data in text format

Var1    Var2    var3
30000   Sedan   Model 2014
30000   CHEVROLET   Corvette Stingray
....

In this instead of \t (tab) delimited it is three space ( ) delimited. Also the data in column may contain single spaces.

How to make R accept this directly?

I've tried read.table, but separator identifier (sep="") does not seem to accept multiple characters and Regular expressions are not supported. (As far as I know.) What seems to work is to read the data in as tab delimited and splitting it once it's in dataframe, but this is silly.

Upvotes: 2

Views: 1718

Answers (2)

Dhawal Kapil
Dhawal Kapil

Reputation: 2757

Exact problem is that your column separator(2 or more spaces) and separator within column values (single space) are overlapping.

To read this correctly segregate the two.

Regex for 2 or more spaces is \s{2,}.

Use this regex to convert column separators into , using gsub.

Read the converted text directly via read.csv

>rawText="Var1    Var2    var3
30000   Sedan   Model 2014
30000   CHEVROLET   Corvette Stingray"

>cleanedText=gsub("\\s{2,}",",",rawText)

>df<-read.table(text=cleanedText)

> df
   Var1      Var2              var3
1 30000     Sedan        Model 2014
2 30000 CHEVROLET Corvette Stingray

Upvotes: 2

Richard Telford
Richard Telford

Reputation: 9923

You can use tidyr::separate to split the data into columns by three spaces.

df <- read.table(text = "Var1    Var2    var3
30000   Sedan   Model 2014
30000   CHEVROLET   Corvette Stingray", sep = "%", skip = 1)

tidyr::separate(df, V1, c("Var1", "Var2", "Var3"), sep = "\\s{3}", )

 Var1      Var2              Var3
1 30000     Sedan        Model 2014
2 30000 CHEVROLET Corvette Stingray

Upvotes: 3

Related Questions