Reputation: 49

How to put entire row as column names?

For the following dataframe, I would like to change the column names with the row where the 1st column starts with a word or words. Here it's the 2nd row and the word Company. However, the row can be different like 1st, 5th or 10th row with different dataframe, and word can also be different like Investment and others.

structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference", 
""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", 
""), X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
3L), class = "data.frame")

          X1             X2       X3        X4       X5       X6
    <chr>              <chr>    <chr>      <dbl>   <chr>    <chr>
1                                           NA               NA
2   Company #           Type   Reference    NA   Footnotes   NA
3   Investments:                            NA               NA

I'm thinking first to get the row number when the 1st column starts with a word/words, and then use that row number to change to column names, or maybe there are better ways to do that.

names(my_df)<- my_df[row_number,]
my_df <- my_df[-row_number,]

Desired Output

        Company #   Type   Reference    NA   Footnotes    NA
    <chr>           <chr>   <chr>      <dbl>   <chr>    <chr>
3   Investments:                         NA              NA

Upvotes: 2

Answers (6)

Andre Wildberg

Reputation: 19211

This uses a keyword in the first column to find the row, then makes sure it has no duplicate names with make.unique and no NAs (character or numeric) with replace.

key <- "Company #"

str <- as.character(dat[dat[,1] == key,][1,])

colnames(dat) <- make.unique(
  replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")

result

dat
     Company # Type Reference Missing Footnotes Missing_1
1                                  NA                <NA>
2    Company # Type Reference      NA Footnotes      <NA>
3 Investments:                     NA                <NA>

If the first non-empty cell should be picked use this

str <- as.character(dat[nchar(dat[,1]) != 0, ][1,])

colnames(dat) <- make.unique(
  replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")

Upvotes: 1

juanbarq

Reputation: 399

You can look for your names using which on first column, then assign to colnames with names and finally delete that row. All this in base R:

df <- structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", ""), 
X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")

row_names <- which(nchar(data.frame(df)[, "X1"]) > 1)[1]

names(df) <- df[row_names, ]
df[-c(row_names),]

Output:

     Company # Type Reference NA Footnotes   NA
1                             NA           <NA>
3 Investments:                NA           <NA>

Upvotes: 1

arg0naut91

Reputation: 14774

You could try:

idx <- which(grepl('[^A-Za-z]', my_df$X1))[1]
colnames(my_df) <- my_df[idx, ]
my_df <- my_df[(idx + 1):nrow(my_df), ]

Output:

     Company # Type Reference NA Footnotes   NA
3 Investments:                NA           <NA>

This would check if any row begins with a letter, take the first occurrence as column names and keep only rows following it.

Upvotes: 2

Maël

Reputation: 52359

#row number of the first word in the first column
row_n <- min(which(nzchar(my_df[[1]])))
janitor::row_to_names(my_df, row_n)

output

#     Company # Type Reference NA Footnotes   NA
#3 Investments:                NA           <NA>

Note that you will have non-unique column names (NA) if you do so. You can use clean_names to quickly remedy this.

Upvotes: 3

Quinten

Reputation: 41573

You could use which to get the row with the string like this:

idx <- which("Company #" == my_df)
names(my_df) <- my_df[idx, ]
my_df <- my_df[-idx,]
my_df
#>      Company # Type Reference NA Footnotes   NA
#> 1                             NA           <NA>
#> 3 Investments:                NA           <NA>

^{Created on 2023-01-05 with reprex v2.0.2}

Upvotes: 1

TarJae

Reputation: 79246

Base R:

colnames(df) <- df[2,]
df <- df[-2,]
df

     Company # Type Reference NA Footnotes   NA
1                             NA           <NA>
3 Investments:                NA           <NA>

Upvotes: 1

How to put entire row as column names?

Answers (6)

result

Related Questions