Reputation: 1396
Good day,
I'm trying to import data from text file to R. Properly formatted data is no issue, but what to do when delimiter is double/triple space?
For example I have following data in text format
Var1 Var2 var3
30000 Sedan Model 2014
30000 CHEVROLET Corvette Stingray
....
In this instead of \t (tab) delimited it is three space ( ) delimited. Also the data in column may contain single spaces.
How to make R accept this directly?
I've tried read.table, but separator identifier (sep="") does not seem to accept multiple characters and Regular expressions are not supported. (As far as I know.) What seems to work is to read the data in as tab delimited and splitting it once it's in dataframe, but this is silly.
Upvotes: 2
Views: 1718
Reputation: 2757
Exact problem is that your column separator(2 or more spaces) and separator within column values (single space) are overlapping.
To read this correctly segregate the two.
Regex for 2 or more spaces is \s{2,}
.
Use this regex to convert column separators into ,
using gsub
.
Read the converted text directly via read.csv
>rawText="Var1 Var2 var3
30000 Sedan Model 2014
30000 CHEVROLET Corvette Stingray"
>cleanedText=gsub("\\s{2,}",",",rawText)
>df<-read.table(text=cleanedText)
> df
Var1 Var2 var3
1 30000 Sedan Model 2014
2 30000 CHEVROLET Corvette Stingray
Upvotes: 2
Reputation: 9923
You can use tidyr::separate
to split the data into columns by three spaces.
df <- read.table(text = "Var1 Var2 var3
30000 Sedan Model 2014
30000 CHEVROLET Corvette Stingray", sep = "%", skip = 1)
tidyr::separate(df, V1, c("Var1", "Var2", "Var3"), sep = "\\s{3}", )
Var1 Var2 Var3
1 30000 Sedan Model 2014
2 30000 CHEVROLET Corvette Stingray
Upvotes: 3