Reputation: 25
I have a single data frame of 100 columns and 25 rows. I would like to cbind different groupings of columns (sometimes as many as 30 columns) in several new data frames without having to type out each column name every time. Some columns that i want fall individually e.g. 6 and 72 and some do lie next to each other e.g. columns 23, 24, 25, 26 (23:26).
Usually i would use:
z <- cbind(visco$fish, visco$bird)
for example, but i have too many columns and need to create too many new data frames to be typing the name of every column that i need every time. Generally i do not attach my data.
I would like to use column numbers, something like:
z <- cbind(6 , 72 , 23:26, data=visco)
and also retain the original column names, not the automatically generated V1, V2. I have tried adding deparse.level=2 but my column names then become "visco$fish" rather than the original "fish"
I feel there should be a simple answer to this, but so far i have failed to find anything that works as i would like.
Upvotes: 1
Views: 35010
Reputation: 1
In R we have vectors and matrices. You can create your own vectors with the function c.
c(1,5,3,4)
They are also the output of many functions such as
rnorm(10)
You can turn vectors into matrices using functions such as rbind
, cbind
or matrix
.
Create the matrix from the vector 1:1000 like this:
X = matrix(1:1000,100,10)
What is the entry in row 25, column 3 ?
Upvotes: -1
Reputation: 4648
I understand your question as , subsetting a large dataframe into smaller ones. Which could be achieved in different ways. One way is, data.table
package helps you to retain the column names, and yet subset it by indexing the columns.
if you have your data as dataframe
, you can just do
DT<- data.table(df)
# You still have to define your subsets of columns you need to create
sub_1<-c(2,3)
sub_2<-c(2:5,9)
sub_3<-c(1:2,5:6,10)
DT[ ,sub_2, with = FALSE]
Output
bird cat dog rat car
1: 0.2682538 0.1386834 0.01633384 0.5336649 0.43432878
2: 0.2418727 0.7530654 0.26999873 0.2679446 0.00859734
3: 0.1211858 0.2563736 0.92637523 0.8572615 0.63165705
4: 0.4556401 0.2343427 0.09324584 0.8731174 0.50098461
5: 0.1646126 0.9258622 0.86957980 0.3636781 0.89608415
Data
require("data.table")
DT <- data.table(matrix(runif(10*10),5,10))
colnames(DT) <- c("fish","bird","cat","dog","rat","tiger","insect","boat","car", "cycle")
Upvotes: 0
Reputation: 3833
df <- data.frame(AA = 11:15, BB = 2:6, CC = 12:16, DD = 3:7, EE = 23:27)
df
# AA BB CC DD EE
# 1 11 2 12 3 23
# 2 12 3 13 4 24
# 3 13 4 14 5 25
# 4 14 5 15 6 26
# 5 15 6 16 7 27
df1 <- data.frame(cbind(df,df,df,df))
df1
# AA BB CC DD EE AA.1 BB.1 CC.1 DD.1 EE.1 AA.2 BB.2 CC.2 DD.2 EE.2 AA.3 BB.3
# 1 11 2 12 3 23 11 2 12 3 23 11 2 12 3 23 11 2
# 2 12 3 13 4 24 12 3 13 4 24 12 3 13 4 24 12 3
# 3 13 4 14 5 25 13 4 14 5 25 13 4 14 5 25 13 4
# 4 14 5 15 6 26 14 5 15 6 26 14 5 15 6 26 14 5
# 5 15 6 16 7 27 15 6 16 7 27 15 6 16 7 27 15 6
# CC.3 DD.3 EE.3
# 1 12 3 23
# 2 13 4 24
# 3 14 5 25
# 4 15 6 26
# 5 16 7 27
Result <- data.frame(cbind(df1[,c(1:5,14:17,20)]))
Result
# AA BB CC DD EE DD.2 EE.2 AA.3 BB.3 EE.3
# 1 11 2 12 3 23 3 23 11 2 23
# 2 12 3 13 4 24 4 24 12 3 24
# 3 13 4 14 5 25 5 25 13 4 25
# 4 14 5 15 6 26 6 26 14 5 26
# 5 15 6 16 7 27 7 27 15 6 27
Note: The columns with same name are adjusted in their next appearance as .1
or .2
by R itself.
Upvotes: 2
Reputation: 851
Here's an example of how to do this using the select
function from dplyr
- which should be your go to package for this type of data wrangling
> library(dplyr)
> df <- head(iris)
> df
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
>
>## select by variable name
>newdf <- df %>% select(Sepal.Length, Sepal.Width,Species)
> newdf
Sepal.Length Sepal.Width Species
1 5.1 3.5 setosa
2 4.9 3.0 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5.0 3.6 setosa
6 5.4 3.9 setosa
>## select by variable indices
> newdf <- df %>% select(1:2,5)
> newdf
Sepal.Length Sepal.Width Species
1 5.1 3.5 setosa
2 4.9 3.0 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5.0 3.6 setosa
6 5.4 3.9 setosa
However, I'm not sure why you would need to do this? Can you not run your analyses on the original dataframe?
Upvotes: 0