Reputation: 85

R split array into Data frame

VERY new to R and struggling with knowing exactly what to ask, have found a similar question here How to split a character vector into data frame? but this has fixed length, and I've been unable to adjust for my problem

I've got some data in an array in R

TEST <- c("Value01:100|Value02:200|Value03:300|","Value04:1|Value05:2|",
            "StillAValueButNamesAreNotConsistent:12345.6789|",
              "AlsoNotAllLinesAreTheSameLength:1|")

The data is stored in pairs, and I'm looking to split out into a dataframe as such:

Variable Value
Value01    100
Value02    200
Value03    300
Value04    1
Value05    2
StillAValueButNamesAreNotConsistent   12345.6789
AlsoNotAllLinesAreTheSameLength     1

The Variable name is a string and the value will always be a number

Any help would be great!

Thanks

Upvotes: 7

Answers (4)

Sowmya S. Manian

Reputation: 3833

With help of strsplit and unlist function. Each command is shown with output below.

Input

 TEST
 # [1] "Value01:100|Value02:200|Value03:300|"           
 # [2] "Value04:1|Value05:2|"                           
 # [3] "StillAValueButNamesAreNotConsistent:12345.6789|"
 # [4] "AlsoNotAllLinesAreTheSameLength:1|"

Splitting by | and then by :

 my_list <- strsplit(unlist(strsplit(TEST, "|", fixed = TRUE)), ":", fixed = TRUE)
 my_list
 # [[1]]
 # [1] "Value01" "100"    

 # [[2]]
 # [1] "Value02" "200"    

 # [[3]]
 # [1] "Value03" "300"    

 # [[4]]
 # [1] "Value04" "1"      

 # [[5]]
 # [1] "Value05" "2"      

 # [[6]]
 # [1] "StillAValueButNamesAreNotConsistent" "12345.6789"                         

 # [[7]]
 # [1] "AlsoNotAllLinesAreTheSameLength" "1"

Converting above list to data.frame

 df <- data.frame(matrix(unlist(my_list), ncol = 2, byrow=TRUE))
 df
 #                                    X1         X2
 # 1                             Value01        100
 # 2                             Value02        200
 # 3                             Value03        300
 # 4                             Value04          1
 # 5                             Value05          2
 # 6 StillAValueButNamesAreNotConsistent 12345.6789
 # 7     AlsoNotAllLinesAreTheSameLength          1

Colnames to dataframe

 names(df) <- c("Variable", "Value")
 df
 #                              Variable      Value
 # 1                             Value01        100
 # 2                             Value02        200
 # 3                             Value03        300
 # 4                             Value04          1
 # 5                             Value05          2
 # 6 StillAValueButNamesAreNotConsistent 12345.6789
 # 7     AlsoNotAllLinesAreTheSameLength          1

Upvotes: 0

DanY

Reputation: 6073

With Base R:

(I've broken out each step to hopefully make the code clear)

# your data
myvec <- c("Value01:100|Value02:200|Value03:300|","Value04:1|Value05:2|",
           "StillAValueButNamesAreNotConsistent:12345.6789|",
           "AlsoNotAllLinesAreTheSameLength:1|")

# convert into one long string
all_text_str <- paste0(myvec, collapse="")

# split the string by "|"
all_text_vec <- unlist(strsplit(all_text_str, split="\\|"))

# split each "|"-group by ":"
data_as_list <- strsplit(all_text_vec, split=":")

# collect into a dataframe
df <- do.call(rbind, data_as_list)

# clean up the dataframe by adding names and converting value to numeric
names(df) <- c("variable", "value")
df$value <- as.numeric(df$value)

Upvotes: 0

Rich Scriven

Reputation: 99321

We can do it in base R with one line. Just change the | characters to line breaks then use : as the sep value in read.table(). You can also set column names there too.

read.table(text = gsub("\\|", "\n", TEST), sep = ":", 
    col.names = c("Variable", "Value"))

#                              Variable    Value
# 1                             Value01   100.00
# 2                             Value02   200.00
# 3                             Value03   300.00
# 4                             Value04     1.00
# 5                             Value05     2.00
# 6 StillAValueButNamesAreNotConsistent 12345.68
# 7     AlsoNotAllLinesAreTheSameLength     1.00

Upvotes: 5

MKR

Reputation: 20085

One can use tidyr based solution. Convert vector TEST to a data.frame and remove the last | from each row as that doesn't carry any meaning as such.

Now, use tidyr::separate_rows to expand rows based on | and then separate data in 2 columns using tidyr::separate function.

library(dplyr)
library(tidyr)

data.frame(TEST) %>%
  mutate(TEST = gsub("\\|$","",TEST)) %>%
  separate_rows(TEST, sep = "[|]") %>%
  separate(TEST, c("Variable", "Value"), ":")

#                              Variable      Value
# 1                             Value01        100
# 2                             Value02        200
# 3                             Value03        300
# 4                             Value04          1
# 5                             Value05          2
# 6 StillAValueButNamesAreNotConsistent 12345.6789
# 7     AlsoNotAllLinesAreTheSameLength          1

Upvotes: 5

R split array into Data frame

Answers (4)

Related Questions