mauron
mauron

Reputation: 19

Dataframe names in R are duplicate

I just generated a empty dataframe and then filled it with random data, but started to have problems with the columns names and basically I want to clean such names and make them unique.

I am sharing my entire code as I do not what is causing trouble

df <- data.frame(ID=integer(),AA=character(),AAA=integer(),Z=double(),stringsAsFactors=FALSE) #empty dataframe
df <- data.frame(ID=c(1:10)) #Consecutive numbers
AA <- sample(c("yes","no"), 10, replace=TRUE, prob = c(0.53, 0.47)) #Random data
df$AA<-as.data.frame(AA)
AAA<-sample(22:60, size=10, replace=TRUE) #Random data
df$AAA<-as.data.frame(AAA)
df$Z<-df$Z <- with(df, (AA == 'yes') * 0.25 + (AAA < 30) * 0.25) #calculated field
df

and the header of the dataframe which the last column must be Z not AA

   ID  AA AAA   AA
1   1 yes  56 0.25
2   2  no  53 0.00

then I try to rename the column name with colnames(df)[4] <- "Z" and got the same result.

When I look at Rstudio output the dataframe looks

   ID  AA.AA AAA.AAA   Z
1   1   yes    56   0.25
2   2    no    53   0.00

The problem arises when I try to make some descriptive statistics

library("GGally")
ggpairs(df)
plot: [1,2] [===>---------------------------] 12% est: 1s Error in `[.data.frame`(xData, rows) : undefined columns selected

Thanks in advance

Upvotes: 0

Views: 117

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

You are creating nested dataframe inside your dataframe. Try :

df <- data.frame(ID=c(1:10)) #Consecutive numbers
df$AA<- sample(c("yes","no"), 10, replace=TRUE, prob = c(0.53, 0.47)) #Random data
df$AAA<-sample(22:60, size=10, replace=TRUE) #Random data
df$Z <- with(df, (AA == 'yes') * 0.25 + (AAA < 30) * 0.25) #calculated field

GGally::ggpairs(df)

enter image description here

Upvotes: 1

James Curran
James Curran

Reputation: 1294

df <- data.frame(ID = 1:10,
                 AA = sample(c("yes","no"), 10, replace=TRUE, prob = c(0.53, 0.47)),
                AAA = sample(22:60, size=10, replace=TRUE)
                 )

library(dplyr)
df = df %>% 
  mutate(Z = (AA == 'yes') * 0.25 + (AAA < 30) * 0.25)


df
> df
   ID  AA AAA    Z
1   1 yes  36 0.25
2   2 yes  37 0.25
3   3  no  45 0.00
4   4 yes  28 0.50
5   5 yes  52 0.25
6   6 yes  43 0.25
7   7 yes  50 0.25
8   8 yes  39 0.25
9   9 yes  59 0.25
10 10  no  32 0.00

Upvotes: 0

Related Questions