Reputation: 85
VERY new to R and struggling with knowing exactly what to ask, have found a similar question here How to split a character vector into data frame? but this has fixed length, and I've been unable to adjust for my problem
I've got some data in an array in R
TEST <- c("Value01:100|Value02:200|Value03:300|","Value04:1|Value05:2|",
"StillAValueButNamesAreNotConsistent:12345.6789|",
"AlsoNotAllLinesAreTheSameLength:1|")
The data is stored in pairs, and I'm looking to split out into a dataframe as such:
Variable Value
Value01 100
Value02 200
Value03 300
Value04 1
Value05 2
StillAValueButNamesAreNotConsistent 12345.6789
AlsoNotAllLinesAreTheSameLength 1
The Variable name is a string and the value will always be a number
Any help would be great!
Thanks
Upvotes: 7
Views: 1495
Reputation: 3833
With help of strsplit
and unlist
function. Each command is shown with output below.
Input
TEST
# [1] "Value01:100|Value02:200|Value03:300|"
# [2] "Value04:1|Value05:2|"
# [3] "StillAValueButNamesAreNotConsistent:12345.6789|"
# [4] "AlsoNotAllLinesAreTheSameLength:1|"
Splitting by |
and then by :
my_list <- strsplit(unlist(strsplit(TEST, "|", fixed = TRUE)), ":", fixed = TRUE)
my_list
# [[1]]
# [1] "Value01" "100"
# [[2]]
# [1] "Value02" "200"
# [[3]]
# [1] "Value03" "300"
# [[4]]
# [1] "Value04" "1"
# [[5]]
# [1] "Value05" "2"
# [[6]]
# [1] "StillAValueButNamesAreNotConsistent" "12345.6789"
# [[7]]
# [1] "AlsoNotAllLinesAreTheSameLength" "1"
Converting above list to data.frame
df <- data.frame(matrix(unlist(my_list), ncol = 2, byrow=TRUE))
df
# X1 X2
# 1 Value01 100
# 2 Value02 200
# 3 Value03 300
# 4 Value04 1
# 5 Value05 2
# 6 StillAValueButNamesAreNotConsistent 12345.6789
# 7 AlsoNotAllLinesAreTheSameLength 1
Colnames to dataframe
names(df) <- c("Variable", "Value")
df
# Variable Value
# 1 Value01 100
# 2 Value02 200
# 3 Value03 300
# 4 Value04 1
# 5 Value05 2
# 6 StillAValueButNamesAreNotConsistent 12345.6789
# 7 AlsoNotAllLinesAreTheSameLength 1
Upvotes: 0
Reputation: 6073
With Base R:
(I've broken out each step to hopefully make the code clear)
# your data
myvec <- c("Value01:100|Value02:200|Value03:300|","Value04:1|Value05:2|",
"StillAValueButNamesAreNotConsistent:12345.6789|",
"AlsoNotAllLinesAreTheSameLength:1|")
# convert into one long string
all_text_str <- paste0(myvec, collapse="")
# split the string by "|"
all_text_vec <- unlist(strsplit(all_text_str, split="\\|"))
# split each "|"-group by ":"
data_as_list <- strsplit(all_text_vec, split=":")
# collect into a dataframe
df <- do.call(rbind, data_as_list)
# clean up the dataframe by adding names and converting value to numeric
names(df) <- c("variable", "value")
df$value <- as.numeric(df$value)
Upvotes: 0
Reputation: 99321
We can do it in base R with one line. Just change the |
characters to line breaks then use :
as the sep
value in read.table()
. You can also set column names there too.
read.table(text = gsub("\\|", "\n", TEST), sep = ":",
col.names = c("Variable", "Value"))
# Variable Value
# 1 Value01 100.00
# 2 Value02 200.00
# 3 Value03 300.00
# 4 Value04 1.00
# 5 Value05 2.00
# 6 StillAValueButNamesAreNotConsistent 12345.68
# 7 AlsoNotAllLinesAreTheSameLength 1.00
Upvotes: 5
Reputation: 20085
One can use tidyr
based solution. Convert vector TEST
to a data.frame and remove the last |
from each row as that doesn't carry any meaning as such.
Now, use tidyr::separate_rows
to expand rows based on |
and then separate data in 2 columns using tidyr::separate
function.
library(dplyr)
library(tidyr)
data.frame(TEST) %>%
mutate(TEST = gsub("\\|$","",TEST)) %>%
separate_rows(TEST, sep = "[|]") %>%
separate(TEST, c("Variable", "Value"), ":")
# Variable Value
# 1 Value01 100
# 2 Value02 200
# 3 Value03 300
# 4 Value04 1
# 5 Value05 2
# 6 StillAValueButNamesAreNotConsistent 12345.6789
# 7 AlsoNotAllLinesAreTheSameLength 1
Upvotes: 5