UPSCFever
UPSCFever

Reputation: 47

Text processing on data frame in r

I have a text file in which data is stored is stored as given below

{{2,3,4},{1,3},{4},{1,2} .....}

I want to remove the brackets and convert it to two column format where first column is bracket number and followed by the term

1 2
1 3
1 4
2 1
2 3
3 4
4 1
4 2

so far i have read the file

tab <- read.table("test.txt",header=FALSE,sep="}")

This gives a dataframe

      V1      V2   V3    V4
1 {{2,3,4  {1,3   {4   {1,2  .....

How to proceed ?

Upvotes: 1

Views: 65

Answers (2)

Sathish
Sathish

Reputation: 12703

Data:

tab <- read.table(text='     V1      V2   V3    V4
1 {{2,3,4  {1,3   {4   {1,2 
2 {{2,3,4  {1,3   {4   {1,2 ')

Code: using gsub, remove { and split the string by ,, then make a data frame. The column names are removed. Finally the list of dataframes in df1 are combined together using rbindlist

df1 <- lapply( seq_along(tab), function(x)  {
  temp <- data.frame( x, strsplit( gsub( "{", "", tab[[x]], fixed = TRUE ), split = "," ),
                      stringsAsFactors = FALSE)
  colnames(temp) <- NULL
  temp
} )

Output:

data.table::rbindlist(df1)
#    V1 V2 V3
# 1:  1  2  2
# 2:  1  3  3
# 3:  1  4  4
# 4:  2  1  1
# 5:  2  3  3
# 6:  3  4  4
# 7:  4  1  1
# 8:  4  2  2

Upvotes: 1

akrun
akrun

Reputation: 887078

We read it with readLines and then remove the {} with strsplit and convert it to two column dataframe with index and reshape to 'long' format with separate_rows

library(tidyverse)
v1 <- setdiff(unlist(strsplit(lines, "[{}]")), c("", ","))
tibble(index = seq_along(v1), Col = v1) %>%
       separate_rows(Col, convert = TRUE)
# A tibble: 8 x 2
#  index   Col
#  <int> <int>
#1     1     2
#2     1     3
#3     1     4
#4     2     1
#5     2     3
#6     3     4
#7     4     1
#8     4     2

Or a base R method would be replace the , after the } with another delimiter, split by , into a list and stack it to a two column data.frame

v1 <- scan(text=gsub("[{}]", "", gsub("},", ";", lines)), what = "", sep=";", quiet = TRUE)
stack(setNames(lapply(strsplit(v1, ","), as.integer), seq_along(v1)))[2:1]

data

lines <- readLines(textConnection("{{2,3,4},{1,3},{4},{1,2}}"))
#reading from file
lines <- readLines("yourfile.txt")

Upvotes: 1

Related Questions