Joan
Joan

Reputation: 88

how to extract two columns of data using R by loop

I have a dataframe with 1000 columns of data

 str(MT)
 'data.frame':  1356 obs. of  1000 variables:
 $ Date : Factor w/ 1356 levels "Apr-1900","Apr-1901",..: 453 340 792 1 905  679 566 114 1244 1131 ...
 $ Year : int  1900 1900 1900 1900 1900 1900 1900 1900 1900 1900 ...
 $ X1   : num  -27.4 -27.8 -17 1.7 7.9 ...
 $ X2   : num  -27.21 -27.99 -17.05 1.69 7.75 ...
 $ X3   : num  -26.67 -27.84 -16.75 2.24 7.82 ...
 $ X4   : num  -26.64 -27.98 -16.83 2.46 7.97 ...
  .....
 $ X1000  : num  -29.13 -30.61 -20.47 -0.46 6.5

I would like to split this dataframe into three columns ( Date, Year and Xn) using a loop so that the end of it all I will have 1000 separate csv files with 3 columns of data. My codes thus far is

for (i in ncol(MT)) {
x[[i]]<-data.frame(MT$Date, Year, MT$[[i]]) }

However, is giving me errors. Your guidance would be appreciated as this I am new to R

Upvotes: 0

Views: 2965

Answers (2)

Ricky
Ricky

Reputation: 4686

Reusing the sample data created by @Molx, and doing some reshaping as @Neal Fultz suggested in comments, using tidyr

# generate sample data
MT <- data.frame(Date = rnorm(5), Year = rnorm(5), X1 = rnorm(5), X2 = rnorm(5), X3 = rnorm(5))

Then fit all variables and values excluding Date and Year as key-value column pairs

> require(tidyr)
> MTg <- gather(MT, var, value, -c(Date, Year))
> MTg
         Date       Year var       value
1  -1.5356474 -1.0963886  X1 -0.74075807
2  -1.1346928  0.2925819  X1  1.42787059
3   0.7031032  0.3361561  X1 -0.27112156
4   1.0140557  1.2587298  X1  0.85693377
5   0.2529787 -3.0113663  X1  0.12686607
6  -1.5356474 -1.0963886  X2  0.21406288
7  -1.1346928  0.2925819  X2 -1.11363330
8   0.7031032  0.3361561  X2 -0.30324978
9   1.0140557  1.2587298  X2  0.48954893
10  0.2529787 -3.0113663  X2  0.85898166
11 -1.5356474 -1.0963886  X3 -0.44394680
12 -1.1346928  0.2925819  X3 -0.86942530
13  0.7031032  0.3361561  X3 -1.62344294
14  1.0140557  1.2587298  X3  0.09880026
15  0.2529787 -3.0113663  X3 -0.76091871

Then run through all possible variable names, exporting them into individual csv files with same name as var.

varnames <- levels(MTg$var)  # get variable names
dummy <- lapply(varnames, function(x)
  write.csv(MTg[MTg$var==x,], file=paste0(x, ".csv"))

Upvotes: 0

Molx
Molx

Reputation: 6931

Your code has some syntax and algorithm errors:

  1. Your for loop is not looping through a range of values, it's "looping" once for i = ncol(MT), it should be (i in 1:ncol(MT)) ;
  2. Actually, you shouldn't loop through all columns, since two of them aren't Xn, so (i in 1:(ncol(MT)-2));
  3. It's not clear if you did, but you should create x before trying to allocate data to it, preferably with its final size;
  4. You didn't use MT$ to select the Year column;
  5. You used both $ and [[ to subset the Xn column. You should use just [ instead, because this way you get to use i and keep the column name.

Fixing all these, with some example data, you get:

MT <- data.frame(Date = rnorm(5), Year = rnorm(5), X1 = rnorm(5), X2 = rnorm(5), X3 = rnorm(5))

nX <- ncol(MT)-2

listofdf <- lapply(1:nX, function(x) NULL)

for (i in 1:nX) {
  listofdf[[i]] <- data.frame(MT$Date, MT$Year, MT[i+2])
}

listofdf
# [[1]]
# MT.Date    MT.Year         X1
# 1 -0.94184053  1.0241134 -0.4329728
# 2  0.59637577 -0.6195477 -1.3011527
# 3  0.33474278  1.0628674 -0.8957239
# 4 -0.04328685  0.4275993 -0.7840214
# 5  0.78799652  0.5707058 -0.4243622
# 
# [[2]]
# MT.Date    MT.Year         X2
# 1 -0.94184053  1.0241134  2.2380838
# 2  0.59637577 -0.6195477 -0.9995170
# 3  0.33474278  1.0628674  0.3452450
# 4 -0.04328685  0.4275993 -1.0453718
# 5  0.78799652  0.5707058 -0.6292885
# 
# [[3]]
# MT.Date    MT.Year          X3
# 1 -0.94184053  1.0241134 -0.05293727
# 2  0.59637577 -0.6195477  0.84947635
# 3  0.33474278  1.0628674  1.17748809
# 4 -0.04328685  0.4275993  1.73233398
# 5  0.78799652  0.5707058 -0.61874653

If you're just going to save them as .csv files, it's not necessary to store in a list though. Instead, you can use:

for (i in 1:nX) {
  tempdf <- data.frame(MT$Date, MT$Year, MT[i+2])
  write.csv(tempdf, paste0("MT_subset_X", i, ".csv"))
}

Upvotes: 1

Related Questions