Reputation: 547
I have this simple code that creates a matrix and a data frame:
mat=matrix(rnorm(40*5), ncol=5)
df=as.data.frame(mat)
df2 <- tidyr::gather(df, "x", "y", V1:V5)
Here is the head(df) created by mat:
V1 V2 V3 V4 V5
1 0.97111725 0.12937942 -0.89643594 -0.30144874 0.10405400
2 0.68372321 -0.08049954 -0.52891953 -0.56752185 -1.04425728
3 1.04553733 0.24499356 0.25919424 -1.51280159 0.70952009
4 0.16433896 -0.46727565 -0.22030923 1.18732203 0.17529333
5 -1.73732058 0.04977374 1.54042252 -1.27585563 -1.05846972
6 0.35953274 3.09224985 -1.24524965 -0.67492542 -0.68065365
I then create another data frame df2
where I gather the values onto two columns x
and y
.
df2 <- tidyr::gather(df, "x", "y", V1:V5)
When I get new datasets where the matrix has a different number of columns, I have to change the value of the columns of df2
to another number.
Example: here I use V5
because I have 5 columns: df2 <- tidyr::gather(df, "x", "y", V1:V5)
so now If I get a new matrix with 40 columns I have to change that manually to V20
: df2 <- tidyr::gather(df, "x", "y", V1:V20)
.
Is there any way to write it like: df2 <- tidyr::gather(df, "x", "y", V1:V+ncol(mat)
Upvotes: 2
Views: 2603
Reputation: 20095
Simple column index works with gather
as well.
The documentation of
gather
reads for...
arguments as:A selection of columns. If empty, all variables are selected. You can supply bare variable names, supply bare variable names, select all variables between x and z with x:z, exclude y with -y. For more options, see the dplyr::select() documentation.
mat=matrix(rnorm(40*5), ncol=5)
df=as.data.frame(mat)
df2 <- tidyr::gather(df, "x", "y", 1:5)
#OR
df2 <- tidyr::gather(df, "x", "y", V1:V5)
#OR
df2 <- tidyr::gather(df, "x", "y") #all columns
head(df2)
# x y
# 1 V1 -0.7403657
# 2 V1 -0.7501310
# 3 V1 2.0371748
# 4 V1 -1.2647994
# 5 V1 1.3464162
# 6 V1 -1.8981365
tail(df2)
# x y
# 195 V5 -2.2739219
# 196 V5 -0.8606414
# 197 V5 -0.8102747
# 198 V5 0.6362617
# 199 V5 0.9962820
# 200 V5 1.6503455
Upvotes: 1
Reputation: 14360
It also looks like you can simply pass the columns as characters like:
df3 <- tidyr::gather(df, "x", "y", names(df)[1]:names(df)[5])
Or as you specifically state in your example to go from V1:V+ncol(df)
you can do:
df3 <- tidyr::gather(df, "x", "y", "V1":tail(names(df),1))
And then comparing to your result:
identical(df2,df3)
#[1] TRUE
This might be nice because it provides the flexibility to programmatically choose any range of columns you want compared to the option where you exclude any columns and it automatically gathers all of them.
Upvotes: 1
Reputation: 782
Yes! You can use paste.
df2 <- tidyr::gather(df, "x" ,"y", V1:paste0("V", ncol(mat)))
Of course you're using all the columns so you don't need to specify the names. But in cases where you truly want to reference variable column names, this is how I go about it.
Alternatively, if you want to use all columns starting with "V", you could do
df2 <- tidyr::gather(df, "x", "y", dplyr::starts_with("V"))
Upvotes: 2