Reputation: 21
So I have a huge data set of over 500,000 different rows I need to separate. Each row is a set of numbers such as this:
P040120000000000000000001001101210000000120000000000
The important thing to note here is the "P04012
" section which corresponds to one specific table. A few hundred thousand items down, the code transforms into this:
P051120150000000000000002158101110000000210000184380
With "P05112015
" meaning something different. The first 8-10 characters for each string of numbers corresponds to a certain table, but as of right now they are all lumped into one huge dataset with one column and 500,000 rows. How do I separate the rows into the specific tables based on their numbers?
I plan to use read.fwf
to split the number strings into columns, so really at this point it is simply figuring out how to split them into tables.
Upvotes: 2
Views: 66
Reputation: 1023
Here's one possibility that might work for you which uses read.fwf()
:
options(stringsAsFactors = F)
# fake data file
tf <- tempfile()
x <- cat(
"P040120000000000000000001001101210000000120000000000",
"P051120150000000000000002158101110000000210000184380",
sep = "\n",
file = tf)
# get table identifiers using read.fwf()
ids <- read.fwf(tf, widths = c(10, 42))
# drop trailing zeros (not sure if this is important)
ids <- gsub("0+$", "", ids$V1)
Upvotes: 2
Reputation: 755
As per your Question and with what I can understand is, you have your data as below : Say in a CSV File:
RowID,Name
P040120000000000000000001001101210000000120000000000,A
P040130000000000000000001001101210000000120000000000,B
P040140000000000000000001001101210000000120000000000,C
P040150000000000000000001001101210000000120000000000,D
You want to create a table based on first few digits, Below is my R code:
rm(list = ls())
FF = read.csv('/home/my/k.csv', header = TRUE);
S=substr(FF$RowID, 1, 6)
T1 <- table(S[1],as.character(FF$Name[1]))
T2 <- table(S[2],as.character(FF$Name[2]))
T3 <- table(S[3],as.character(FF$Name[3]))
T4 <- table(S[4],as.character(FF$Name[4]))
T1;T2;T3;T4;
You can create table using for loop
, if you have lots of row fields.
Hope I answered your question.
Upvotes: 1