John
John

Reputation: 69

R How to separate a string based white spaces but keeping the decimal dot?

Could you please help me to separate a string column into 7 columns by keeping the decimal numbers?

Here is one example:

library(dplyr)
library(tidyr)
library(stringr)

x <- data.frame(Flat = c("2000M01  XZ ELDL U K EER 213.9", "2000M02  XY MLML O T RRE 255.6" , "2000M03  UY LEEE M P SSE 259.4" ))  
x %>% separate(Flat, c("A1","B2","C3","D4","E5","F6","Value")

The output is:

     A1   B2  C3   D4 E5 F6 Value
1 2000M01 XZ ELDL  U  K EER   213
2 2000M02 XY MLML  O  T RRE   255
3 2000M03 UY LEEE  M  P SSE   259
Warning message:
Expected 7 pieces. Additional pieces discarded in 3 rows [1, 2, 3].

instead of the desired output:

   A1     B2 C3    D4 E5 F6   Value
1 2000M01 XZ ELDL  U  K EER   213.9
2 2000M02 XY MLML  O  T RRE   255.6
3 2000M03 UY LEEE  M  P SSE   259.4

I tried many "sep = " options in the separate function, but nothing helps.

Thank you,

John

Upvotes: 0

Views: 96

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 101393

A data.table option with tstrsplit

type.convert(
    setDT(x)[, setNames(
        tstrsplit(Flat, "\\s+"),
        c("A1", "B2", "C3", "D4", "E5", "F6", "Value")
    )],
    as.is = TRUE
)

which gives

        A1 B2   C3 D4 E5  F6 Value
1: 2000M01 XZ ELDL  U  K EER 213.9
2: 2000M02 XY MLML  O  T RRE 255.6
3: 2000M03 UY LEEE  M  P SSE 259.4

Upvotes: 2

Wimpel
Wimpel

Reputation: 27732

data.table way

library(data.table)
ans <- data.table()[,c("A1","B2","C3","D4","E5","F6","Value") := tstrsplit(x$Flat, "[ ]+")]
#         A1 B2   C3 D4 E5  F6 Value
# 1: 2000M01 XZ ELDL  U  K EER 213.9
# 2: 2000M02 XY MLML  O  T RRE 255.6
# 3: 2000M03 UY LEEE  M  P SSE 259.4

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21918

This is certainly not so brilliant as dear @akrun's, but it will also do the trick:

library(tidyr)

x %>%
  extract(Flat, c("A1","B2","C3","D4","E5","F6","Value"), 
          "(\\d+M\\d+)  ([[:upper:]]+) ([[:upper:]]+) ([[:upper:]]) ([[:upper:]]+) ([[:upper:]]+) (\\d+\\.\\d)")

       A1 B2   C3 D4 E5  F6 Value
1 2000M01 XZ ELDL  U  K EER 213.9
2 2000M02 XY MLML  O  T RRE 255.6
3 2000M03 UY LEEE  M  P SSE 259.4

Upvotes: 2

akrun
akrun

Reputation: 887128

We could do this in base R with read.table

read.table(text = x$Flat, header = FALSE,
       col.names =c("A1","B2","C3","D4","E5","F6","Value"))
       A1 B2   C3 D4 E5  F6 Value
1 2000M01 XZ ELDL  U  K EER 213.9
2 2000M02 XY MLML  O  T RRE 255.6
3 2000M03 UY LEEE  M  P SSE 259.4

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Pass the sep argument as whitespace in separate. Also use convert = TRUE to change the Value column to numeric automatically.

tidyr::separate(x, Flat, 
      c("A1","B2","C3","D4","E5","F6","Value"), sep = '\\s+', convert = TRUE)

#       A1 B2   C3 D4 E5  F6 Value
#1 2000M01 XZ ELDL  U  K EER 213.9
#2 2000M02 XY MLML  O  T RRE 255.6
#3 2000M03 UY LEEE  M  P SSE 259.4

Upvotes: 3

Related Questions