Roger1219
Roger1219

Reputation: 13

How to avoid adding the pre-column name after the new column name in data frame imported from .xlsx?

I am new to R Studio. I found that when I copy a column from a data frame imported from xlsx file, the new column name includes a suffix of pre-column name. But if I create the data frame in R, the new column name will not include the suffix. This problem really confused me. And I want to know how to cancel this feature. Here is my test code:

library(readxl)
> df <- read_excel("test.xlsx", sheet = "Sheet1")                                                                                                                                         
> cName <- "Column2"
> df$Column3 <- df[,cName]
> df
# A tibble: 3 × 3
  Column1 Column2 Column3$Column2
    <dbl>   <dbl>           <dbl>
1       1       4               4
2       2       5               5
3       3       6               6

This code shows the suffix when using read_excel

> df <- data.frame( Column1 = c(1,2,3),
+                   Column2 = c(4,5,6))
> cName <- "Column2"
> df$Column3 <- df[,cName]
> df
  Column1 Column2 Column3
1       1       4       4
2       2       5       5
3       3       6       6

And this code shows no suffix when creating the df in R.

I want to know how to avoid the suffix when I import a data frame form .xlsx file.

Upvotes: 1

Views: 52

Answers (1)

joran
joran

Reputation: 173727

This behavior isn't specific to the data being sourced from an Excel file, it's that that package happens to return a tibble instead of a data.frame. The two objects are largely the same, except they have different defaults for the drop argument when subsetting via [.

df <- tibble( Column1 = c(1,2,3),
                  Column2 = c(4,5,6))
cName <- "Column2"
df$Column3 <- df[,cName] # Assigning a one column tibble
df$Column3 <- df[,cName,drop = TRUE] # Assigning a vector

Indeed, if you look at the structure in the first case you'll see that what's happened probably isn't at all what you wanted:

> str(df)
tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
 $ Column1: num [1:3] 1 2 3
 $ Column2: num [1:3] 4 5 6
 $ Column3: tibble [3 × 1] (S3: tbl_df/tbl/data.frame)
  ..$ Column2: num [1:3] 4 5 6

In general, it is safer (both with tibbles & data frames) to select a single column as a vector via:

df[[cName]]

Upvotes: 1

Related Questions