Reputation: 69
Could you please help me to separate a string column into 7 columns by keeping the decimal numbers?
Here is one example:
library(dplyr)
library(tidyr)
library(stringr)
x <- data.frame(Flat = c("2000M01 XZ ELDL U K EER 213.9", "2000M02 XY MLML O T RRE 255.6" , "2000M03 UY LEEE M P SSE 259.4" ))
x %>% separate(Flat, c("A1","B2","C3","D4","E5","F6","Value")
The output is:
A1 B2 C3 D4 E5 F6 Value
1 2000M01 XZ ELDL U K EER 213
2 2000M02 XY MLML O T RRE 255
3 2000M03 UY LEEE M P SSE 259
Warning message:
Expected 7 pieces. Additional pieces discarded in 3 rows [1, 2, 3].
instead of the desired output:
A1 B2 C3 D4 E5 F6 Value
1 2000M01 XZ ELDL U K EER 213.9
2 2000M02 XY MLML O T RRE 255.6
3 2000M03 UY LEEE M P SSE 259.4
I tried many "sep = "
options in the separate function, but nothing helps.
Thank you,
John
Upvotes: 0
Views: 96
Reputation: 101393
A data.table
option with tstrsplit
type.convert(
setDT(x)[, setNames(
tstrsplit(Flat, "\\s+"),
c("A1", "B2", "C3", "D4", "E5", "F6", "Value")
)],
as.is = TRUE
)
which gives
A1 B2 C3 D4 E5 F6 Value
1: 2000M01 XZ ELDL U K EER 213.9
2: 2000M02 XY MLML O T RRE 255.6
3: 2000M03 UY LEEE M P SSE 259.4
Upvotes: 2
Reputation: 27732
data.table
way
library(data.table)
ans <- data.table()[,c("A1","B2","C3","D4","E5","F6","Value") := tstrsplit(x$Flat, "[ ]+")]
# A1 B2 C3 D4 E5 F6 Value
# 1: 2000M01 XZ ELDL U K EER 213.9
# 2: 2000M02 XY MLML O T RRE 255.6
# 3: 2000M03 UY LEEE M P SSE 259.4
Upvotes: 2
Reputation: 21918
This is certainly not so brilliant as dear @akrun's, but it will also do the trick:
library(tidyr)
x %>%
extract(Flat, c("A1","B2","C3","D4","E5","F6","Value"),
"(\\d+M\\d+) ([[:upper:]]+) ([[:upper:]]+) ([[:upper:]]) ([[:upper:]]+) ([[:upper:]]+) (\\d+\\.\\d)")
A1 B2 C3 D4 E5 F6 Value
1 2000M01 XZ ELDL U K EER 213.9
2 2000M02 XY MLML O T RRE 255.6
3 2000M03 UY LEEE M P SSE 259.4
Upvotes: 2
Reputation: 887128
We could do this in base R
with read.table
read.table(text = x$Flat, header = FALSE,
col.names =c("A1","B2","C3","D4","E5","F6","Value"))
A1 B2 C3 D4 E5 F6 Value
1 2000M01 XZ ELDL U K EER 213.9
2 2000M02 XY MLML O T RRE 255.6
3 2000M03 UY LEEE M P SSE 259.4
Upvotes: 2
Reputation: 388982
Pass the sep
argument as whitespace in separate
. Also use convert = TRUE
to change the Value
column to numeric automatically.
tidyr::separate(x, Flat,
c("A1","B2","C3","D4","E5","F6","Value"), sep = '\\s+', convert = TRUE)
# A1 B2 C3 D4 E5 F6 Value
#1 2000M01 XZ ELDL U K EER 213.9
#2 2000M02 XY MLML O T RRE 255.6
#3 2000M03 UY LEEE M P SSE 259.4
Upvotes: 3