Idignatius
Idignatius

Reputation: 21

Separating rows containing values

So I have a huge data set of over 500,000 different rows I need to separate. Each row is a set of numbers such as this:

P040120000000000000000001001101210000000120000000000

The important thing to note here is the "P04012" section which corresponds to one specific table. A few hundred thousand items down, the code transforms into this:

P051120150000000000000002158101110000000210000184380

With "P05112015" meaning something different. The first 8-10 characters for each string of numbers corresponds to a certain table, but as of right now they are all lumped into one huge dataset with one column and 500,000 rows. How do I separate the rows into the specific tables based on their numbers?

I plan to use read.fwf to split the number strings into columns, so really at this point it is simply figuring out how to split them into tables.

Upvotes: 2

Views: 66

Answers (2)

johnson-shuffle
johnson-shuffle

Reputation: 1023

Here's one possibility that might work for you which uses read.fwf():

options(stringsAsFactors = F)

# fake data file
tf <- tempfile()
x <- cat(
  "P040120000000000000000001001101210000000120000000000",
  "P051120150000000000000002158101110000000210000184380",
  sep = "\n",
  file = tf)

# get table identifiers using read.fwf()
ids <- read.fwf(tf, widths = c(10, 42))

# drop trailing zeros (not sure if this is important)
ids <- gsub("0+$", "", ids$V1)

Upvotes: 2

Joker
Joker

Reputation: 755

As per your Question and with what I can understand is, you have your data as below : Say in a CSV File:

 RowID,Name
 P040120000000000000000001001101210000000120000000000,A
 P040130000000000000000001001101210000000120000000000,B
 P040140000000000000000001001101210000000120000000000,C
 P040150000000000000000001001101210000000120000000000,D

You want to create a table based on first few digits, Below is my R code:

rm(list = ls())
FF = read.csv('/home/my/k.csv', header = TRUE);
S=substr(FF$RowID, 1, 6)
T1 <- table(S[1],as.character(FF$Name[1]))
T2 <- table(S[2],as.character(FF$Name[2]))
T3 <- table(S[3],as.character(FF$Name[3]))
T4 <- table(S[4],as.character(FF$Name[4]))
T1;T2;T3;T4;

You can create table using for loop , if you have lots of row fields. Hope I answered your question.

Upvotes: 1

Related Questions