Ville
Ville

Reputation: 547

R: Concatenate data frame column name with a number

I have this simple code that creates a matrix and a data frame:

   mat=matrix(rnorm(40*5), ncol=5)
   df=as.data.frame(mat)
   df2 <- tidyr::gather(df, "x", "y", V1:V5)

Here is the head(df) created by mat:

           V1          V2          V3          V4          V5
1   0.97111725  0.12937942 -0.89643594 -0.30144874  0.10405400
2   0.68372321 -0.08049954 -0.52891953 -0.56752185 -1.04425728
3   1.04553733  0.24499356  0.25919424 -1.51280159  0.70952009
4   0.16433896 -0.46727565 -0.22030923  1.18732203  0.17529333
5  -1.73732058  0.04977374  1.54042252 -1.27585563 -1.05846972
6   0.35953274  3.09224985 -1.24524965 -0.67492542 -0.68065365

I then create another data frame df2 where I gather the values onto two columns x and y.

df2 <- tidyr::gather(df, "x", "y", V1:V5)

When I get new datasets where the matrix has a different number of columns, I have to change the value of the columns of df2 to another number.

Example: here I use V5 because I have 5 columns: df2 <- tidyr::gather(df, "x", "y", V1:V5) so now If I get a new matrix with 40 columns I have to change that manually to V20: df2 <- tidyr::gather(df, "x", "y", V1:V20).

Is there any way to write it like: df2 <- tidyr::gather(df, "x", "y", V1:V+ncol(mat)

Upvotes: 2

Views: 2603

Answers (3)

MKR
MKR

Reputation: 20095

Simple column index works with gather as well.

The documentation of gather reads for ... arguments as:

A selection of columns. If empty, all variables are selected. You can supply bare variable names, supply bare variable names, select all variables between x and z with x:z, exclude y with -y. For more options, see the dplyr::select() documentation.

mat=matrix(rnorm(40*5), ncol=5)
df=as.data.frame(mat)

df2 <- tidyr::gather(df, "x", "y", 1:5)

#OR
df2 <- tidyr::gather(df, "x", "y", V1:V5)

#OR
df2 <- tidyr::gather(df, "x", "y")  #all columns 


head(df2)
# x          y
# 1 V1 -0.7403657
# 2 V1 -0.7501310
# 3 V1  2.0371748
# 4 V1 -1.2647994
# 5 V1  1.3464162
# 6 V1 -1.8981365

tail(df2)
# x          y
# 195 V5 -2.2739219
# 196 V5 -0.8606414
# 197 V5 -0.8102747
# 198 V5  0.6362617
# 199 V5  0.9962820
# 200 V5  1.6503455

Upvotes: 1

Mike H.
Mike H.

Reputation: 14360

It also looks like you can simply pass the columns as characters like:

df3 <- tidyr::gather(df, "x", "y", names(df)[1]:names(df)[5])

Or as you specifically state in your example to go from V1:V+ncol(df) you can do:

df3 <- tidyr::gather(df, "x", "y", "V1":tail(names(df),1))

And then comparing to your result:

identical(df2,df3)
#[1] TRUE

This might be nice because it provides the flexibility to programmatically choose any range of columns you want compared to the option where you exclude any columns and it automatically gathers all of them.

Upvotes: 1

Lodewic Van Twillert
Lodewic Van Twillert

Reputation: 782

Yes! You can use paste.

df2 <- tidyr::gather(df, "x" ,"y", V1:paste0("V", ncol(mat)))

Of course you're using all the columns so you don't need to specify the names. But in cases where you truly want to reference variable column names, this is how I go about it.

Alternatively, if you want to use all columns starting with "V", you could do

df2 <- tidyr::gather(df, "x", "y", dplyr::starts_with("V"))

Upvotes: 2

Related Questions