Dataframe names in R are duplicate

Question

I just generated a empty dataframe and then filled it with random data, but started to have problems with the columns names and basically I want to clean such names and make them unique.

I am sharing my entire code as I do not what is causing trouble

df <- data.frame(ID=integer(),AA=character(),AAA=integer(),Z=double(),stringsAsFactors=FALSE) #empty dataframe
df <- data.frame(ID=c(1:10)) #Consecutive numbers
AA <- sample(c("yes","no"), 10, replace=TRUE, prob = c(0.53, 0.47)) #Random data
df$AA<-as.data.frame(AA)
AAA<-sample(22:60, size=10, replace=TRUE) #Random data
df$AAA<-as.data.frame(AAA)
df$Z<-df$Z <- with(df, (AA == 'yes') * 0.25 + (AAA < 30) * 0.25) #calculated field
df

and the header of the dataframe which the last column must be Z not AA

   ID  AA AAA   AA
1   1 yes  56 0.25
2   2  no  53 0.00

then I try to rename the column name with colnames(df)[4] <- "Z" and got the same result.

When I look at Rstudio output the dataframe looks

   ID  AA.AA AAA.AAA   Z
1   1   yes    56   0.25
2   2    no    53   0.00

The problem arises when I try to make some descriptive statistics

library("GGally")
ggpairs(df)
plot: [1,2] [===>---------------------------] 12% est: 1s Error in `[.data.frame`(xData, rows) : undefined columns selected

Thanks in advance

Ronak Shah · Accepted Answer

You are creating nested dataframe inside your dataframe. Try :

df <- data.frame(ID=c(1:10)) #Consecutive numbers
df$AA<- sample(c("yes","no"), 10, replace=TRUE, prob = c(0.53, 0.47)) #Random data
df$AAA<-sample(22:60, size=10, replace=TRUE) #Random data
df$Z <- with(df, (AA == 'yes') * 0.25 + (AAA < 30) * 0.25) #calculated field

GGally::ggpairs(df)

Dataframe names in R are duplicate

Answers (2)

Related Questions