Reputation: 47
I have a text file in which data is stored is stored as given below
{{2,3,4},{1,3},{4},{1,2} .....}
I want to remove the brackets and convert it to two column format where first column is bracket number and followed by the term
1 2
1 3
1 4
2 1
2 3
3 4
4 1
4 2
so far i have read the file
tab <- read.table("test.txt",header=FALSE,sep="}")
This gives a dataframe
V1 V2 V3 V4
1 {{2,3,4 {1,3 {4 {1,2 .....
How to proceed ?
Upvotes: 1
Views: 65
Reputation: 12703
Data:
tab <- read.table(text=' V1 V2 V3 V4
1 {{2,3,4 {1,3 {4 {1,2
2 {{2,3,4 {1,3 {4 {1,2 ')
Code: using gsub
, remove {
and split the string by ,
, then make a data frame. The column names are removed. Finally the list of dataframes in df1
are combined together using rbindlist
df1 <- lapply( seq_along(tab), function(x) {
temp <- data.frame( x, strsplit( gsub( "{", "", tab[[x]], fixed = TRUE ), split = "," ),
stringsAsFactors = FALSE)
colnames(temp) <- NULL
temp
} )
Output:
data.table::rbindlist(df1)
# V1 V2 V3
# 1: 1 2 2
# 2: 1 3 3
# 3: 1 4 4
# 4: 2 1 1
# 5: 2 3 3
# 6: 3 4 4
# 7: 4 1 1
# 8: 4 2 2
Upvotes: 1
Reputation: 887078
We read it with readLines
and then remove the {}
with strsplit
and convert it to two column dataframe with index and reshape to 'long' format with separate_rows
library(tidyverse)
v1 <- setdiff(unlist(strsplit(lines, "[{}]")), c("", ","))
tibble(index = seq_along(v1), Col = v1) %>%
separate_rows(Col, convert = TRUE)
# A tibble: 8 x 2
# index Col
# <int> <int>
#1 1 2
#2 1 3
#3 1 4
#4 2 1
#5 2 3
#6 3 4
#7 4 1
#8 4 2
Or a base R
method would be replace the ,
after the }
with another delimiter, split by ,
into a list
and stack
it to a two column data.frame
v1 <- scan(text=gsub("[{}]", "", gsub("},", ";", lines)), what = "", sep=";", quiet = TRUE)
stack(setNames(lapply(strsplit(v1, ","), as.integer), seq_along(v1)))[2:1]
lines <- readLines(textConnection("{{2,3,4},{1,3},{4},{1,2}}"))
#reading from file
lines <- readLines("yourfile.txt")
Upvotes: 1