Warpspeed SCP
Warpspeed SCP

Reputation: 174

How to append an arbitrary sequence to a dataframe column

I am trying to add values to a dataframe in a for loop like this -

z <- data.frame()
rnms <- rownames(prevdata)
for (i in 1:nrow(prevdata)) {
    s <- stack(prevdata[i, ])
    # Here, the names of the second and third columns are 'ind' and 'values' even though I've specified the column names... idk why.
    z <- rbind(z, data.frame(year=rnms[i], month=s[2], values=s[1])) 
    # Get months remaining.
    tmp <- nrow(s)+1 
    # try to assign the data for missing months as zero. 
    # This fails and assigns NA to the remaining months instead of the values from tmp to 12.   
    z <- rbind(z, data.frame(year=rnms[i], ind=seq(tmp:12) + tmp - 1, values=0)) 
}                             Doesn't work ^^^^^^^^^^^^^^^^^^^^^^^^^
colnames(z) <- c("YEAR", "MONTH", "values")

I don't understand why this does not work.

Doing seq(tmp:12) instead of seq(tmp:12) + tmp - 1 works, but leads to a range from 1 to tmp being assigned instead of a range from tmp to 12. Here's a sample from prevdata -

>head(prevdata)
         1     2     3     4     5     6     7
1963  74.1 155.2 226.4 186.7  37.2  23.6  23.6
1964 135.3 204.9 180.6 223.4 186.0 111.6 127.0
1965  69.3  24.2 151.9 321.3 121.7 147.8  44.1
1966  21.1 125.7  62.1 109.8 101.3  68.0 123.9
1967  12.4 118.0 245.4 269.6 139.1 117.1  63.1
1968  69.8  37.2 264.8 112.8 133.5  24.2  62.3

When I use ind=seq(tmp:12) + tmp - 1 I get-

> z[1:12,]
   year  ind values
1  1963    1   74.1
2  1963    2  155.2
3  1963    3  226.4
4  1963    4  186.7
5  1963    5   37.2
6  1963    6   23.6
7  1963    7   23.6
8  1963 <NA>    0.0 < ind here should be 8, 9, 10...
9  1963 <NA>    0.0
10 1963 <NA>    0.0
11 1963 <NA>    0.0
12 1963 <NA>    0.0

When I use ind=seq(tmp:12) I get-

> z[1:12,]
   year  ind values
1  1963    1   74.1
2  1963    2  155.2
3  1963    3  226.4
4  1963    4  186.7
5  1963    5   37.2
6  1963    6   23.6
7  1963    7   23.6
8  1963    1    0.0 < Same thing.
9  1963    2    0.0
10 1963    3    0.0
11 1963    4    0.0
12 1963    5    0.0

On its own, seq(tmp:12) + tmp - 1 returns what I want-

> tmp <- 8
> seq(tmp:12) + tmp - 1
[1]  8  9 10 11 12

But it doesn't work when I try it with my code above. What am I doing wrong?

EDIT

Apart from the nonstandard seq(x:y) version, I also tried this with seq(x, y). That gave me exactly the same results as what I have mentioned above.

I tried ind=seq(tmp, 12) which gave me what I wanted when executed separately, but caused the NA error in the output, as well as ind=seq(1, 12 - tmp + 1), which didn't cause warnings to generate but resulted in the same output as ind=seq(tmp:12).

Upvotes: 1

Views: 85

Answers (1)

Dave2e
Dave2e

Reputation: 24079

The problem was the stack function converted the independent variable into a factor, thus the rbind function failed when attempting to bind unknown factors into the data.frame.
In this solution I made sure all columns of s were not factors and by adding stringsAsFactor= FALSE to the data.frame function.

prevdata<-read.table(header=TRUE, 
text="1     2     3     4     5     6     7
1963  74.1 155.2 226.4 186.7  37.2  23.6  23.6
1964 135.3 204.9 180.6 223.4 186.0 111.6 127.0
1965  69.3  24.2 151.9 321.3 121.7 147.8  44.1
1966  21.1 125.7  62.1 109.8 101.3  68.0 123.9
1967  12.4 118.0 245.4 269.6 139.1 117.1  63.1
1968  69.8  37.2 264.8 112.8 133.5  24.2  62.3")

names(prevdata)<-1:7
rnms <- rownames(prevdata)
z<-data.frame()  #initialize the dataframe prior to binding 
for (i in 1:nrow(prevdata)) {
  s <- stack(prevdata[i, ])

  #Need to convert factor column to numeric
  s$ind<-as.numeric(as.character(s$ind))
  #need to prevent the characters strings from being converted to factors
  z <- rbind(z, data.frame(year=rnms[i], month=s[2], values=s[1], stringsAsFactors = FALSE)) 

  # Get months remaining.
  tmp <- nrow(s)+1 
  # try to assign the data for missing months as zero. 
  z <- rbind(z, data.frame(year=rnms[i], ind=seq(tmp, 12), values=0)) 
}                            
colnames(z) <- c("YEAR", "MONTH", "values")

This solution works, there are better ways to solve this problem. For example you could add columns 8 to 12 onto the original dataframe frame and then convert to a long format with reshape, melt or tidyr:gather

Edit:

Using the tidyr package:

prevdata[,8:12]<-0
names(prevdata)<-1:12
prevdata$year<-row.names(prevdata)
library(tidyr)
z<-gather(prevdata, month, values, -year)
z[order(z$YEAR, z$MONTH),]

Upvotes: 1

Related Questions