Kostas
Kostas

Reputation: 7

Read numbers from a list and create csv file with the columns that correspond to the numbers read

I have a dataset that is saved in a csv file called 'extremes' (30 columns and 2000 rows). I perform cluster analysis and I use capture.output to save the output in a csv file. Specifically, I do:

    capture.output(inf,file="Clusters.csv", append=TRUE)

where 'inf' is a function that returns the analysis output.'inf' is a list.

The output I save in the csv file (called 'Clusters.csv') is the following (as it appears in the R console):

$assign
 [1] 1 2 3 1 1 1 1 2 1 4 1 4 1 2 4 2 3 5 4 1 2 2 2 1 1 1 1 1 1 1

$list
$list$cluster.1
 [1]  1  4  5  6  7  9 11 13 20 24 25 26 27 28 29 30

$list$cluster.2
[1]  2  8 14 16 21 22 23

$list$cluster.3
[1]  3 17

$list$cluster.4
[1] 10 12 15 19

$list$cluster.5
[1] 18


$num
cluster.1 cluster.2 cluster.3 cluster.4 cluster.5 
   16         7         2         4         1 

Based on the analysis, I also get a parameter called 'NumberClusters' which indicates the optimal number of clusters (for this particular dataset it takes the values of 2).

What I want to achieve is to read the specific columns from the csv file 'extremes' that make up the first cluster (i.e., 1 4 5 6 7 9 11 13 20 24 25 26 27 28 29 30) and save them in a data.frame (and maybe store them in a csv file named 'Cluster1', then read the specific columns from the csv file 'extremes' that make up the second cluster (i.e., 2 8 14 16 21 22 23) and save them in a data.frame (and maybe in a csv file named 'Cluster2'). I can then continue my analysis using the two datasets 'Cluster1' and 'Cluster2'. My main problem, I think, is to find a way to read the columns that make up each cluster (e.g., for cluster 1, columns: 1 4 5 6 7 9 11 13 20 24 25 26 27 28 29 30) from the file 'Clusters.csv'. I believe that I will then be able to read the data contained in these columns in file 'extremes.csv' using

read.xls("extremes.csv")[c(1  4  5  6  7  9 11 13 20 24 25 26 27 28 29 30])     

I have also tried to work with the package 'xlsx' but didn't achieve anything.

Any help will be greatly appreciated because I have been stuck with this for some time now.

My data looks like this (this is a small sample; in fact I have 30 columns (financial indices) and 2019 rows (daily returns). I hope this helps.

Food    Beer    Smoke   Games   Books   Hshld   Clths
0.57    1.23    1.19    0.54    -0.19   0.31    0.52
0.48    0.57    -0.89   -0.23   -0.25   0.29    -0.26
-0.55   -0.75   -0.8    -0.41   -0.2    -0.29   -0.61
 0.6    -0.1    0.31    1.16    1.14    0.74    0.72
-0.44   -1.34   -1.73   -0.16   0.22    -0.97   -0.96
-0.25   -0.21   -0.07   -0.73   -0.4    -0.56   -0.8
0.11    -0.94   -0.3    -0.38   -0.07   -0.38   -0.24
-1.34   -2.12   -1.54   -1.52   -0.68   -1.72   -1.91

I run your code (your mock example) and I get

> cluster1
Null data.table (0 rows and 0 cols)

same for cluster2.

I then run the following using my dataset and get the same message (i.e., Null data.table (0 rows and 0 cols).

output <- read.csv("Clusters.csv", header = TRUE)
output <- list()
cluster.data <- matrix(extremes, nrow = 2019, ncol = 30, byrow = TRUE) 
DT <- as.data.table(cluster.data)
cluster1 <- DT[, c(output$list$cluster1), with = FALSE]
cluster1
cluster2 <- DT[, c(output$list$cluster2), with = FALSE]
cluster2

I suspect that I got it completely wrong.

I run the code without output<-list(). That is:

EDIT: I think it is because we are not getting the output$list$cluster2 name correct. Try output$list$cluster.2. I made changes to block below. Please try:

output <- read.csv("Clusters.csv", header = TRUE)
# take a look at output
output

cluster.data <- matrix(extremes, nrow = 2019, ncol = 30, byrow = TRUE) 
DT <- as.data.table(cluster.data)
cluster1 <- DT[, c(output$list$cluster.1), with = FALSE]
cluster1
cluster2 <- DT[, c(output$list$cluster.2), with = FALSE]
cluster2

edit: We are nearly there! Please try print out output and output$list$cluster.1 and also str(output$list$cluster.2) to see how it is classed. Finally, if this does not work use dput on output to a file and look at it in Notepad/text editor. dput writes data into R commands to recreate. Post it so we can check output.

Upvotes: 0

Views: 273

Answers (1)

micstr
micstr

Reputation: 5206

Its a bit tricky without your data block. Please take a look at the data.table cheatsheet if you are unfamiliar with this package.

Assuming your columns are standard so don't have names V1 V2. Lets isolate your two blocks so you can save them down.

library(data.table)

# mini mockup example using just first 5 columns
output <- list()
output$list$cluster.1 <- c(1,4,5)
output$list$cluster.2 <- c(2)
# EDIT: Kostas you would do this with your data
#  "output I save in the csv file (called 'Clusters.csv')"
# get the output structure back
# output <- read.csv("Clusters.csv", header = TRUE)
# Then the code will read your list results

# mockup of your data using a to e so we can see how columns selected
#   its simply two lines of repeated a b c d e
cluster.data <- matrix(letters[1:5], nrow = 2, ncol = 5, byrow = TRUE) 

#assuming you want the column names will just be default V1 V2...
#  cluster 1 we would expect it to look like this
#  headings     V1 V4 V5
#  data         a d e 
#  data line 2  a d e 


# turn it into a data.table
#   you would read your data in as csv 
#   data <- as.data.table(read.csv("yourfile.csv")) etc.
DT <- as.data.table(cluster.data)

# subset data to cluster 1
cluster1 <- DT[, c(output$list$cluster.1), with = FALSE]

   V1 V4 V5
1:  a  d  e
2:  a  d  e

# likewise for 2
cluster2 <- DT[, c(output$list$cluster.2), with = FALSE]

   V2
1:  b
2:  b

Note I am using with = FALSE in data.table so that the 4th column and not column called 4 is called.

You would then save these blocks down. See 'write.table' or 'write.csv'. Type ?write.table at prompt to get help.

You can "parameterize" for different cluster lengths using: as.name(paste0("cluster.", as.character(i))) to get cluster.3 when i = 3

Hope this helps!

LATER EDIT: Kostas I see your data from output is now called cluster.1 not cluster1 as I originally had so I have edited my code above. $list$cluster.1

Upvotes: 0

Related Questions