Reputation: 97
For some reason specific to my R program, I want to assign column names and row names based on existing column and row in a dataframe in R. That is to say, the first line has to become the column names, and the first column has to become the row names.
I first thought it was easy, using :
colnames(myDataFrame) <- myDataFrame[1,]
rownames(MyDataFrame) <- myDataFrame[,1]
As it is also written in this topic.
But I have a lot of cases to handle in the first row and first column of my data frame : only text, text with numbers, text or numbers... That's why this sometimes does not work. See an example with only text in the first line :
I first load my data frame with no headers at all :
> tab <- read.table(file, header = FALSE, sep = "\t")
> tab
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 TEST this is only text hoping it will work
2 I 4 0 0 0 0 0 0 1
3 really 7 6 6 3 10 6 10 10
4 hope 187 141 140 129 130 157 138 168
Here is my data frame without row and column names. I want "TEST this is only text hoping it will work" to become my column name. This doest not work :
> colnames(tab) <- tab[1,]
> tab
2 10 9 9 10 8 9 8 9
1 TEST this is only text hoping it will work
2 I 4 0 0 0 0 0 0 1
3 really 7 6 6 3 10 6 10 10
4 hope 187 141 140 129 130 157 138 168
Whereas this works :
> colnames(tab) <- as.character(unlist(tab[1,]))
> tab
TEST this is only text hoping it will work
1 TEST this is only text hoping it will work
2 I 4 0 0 0 0 0 0 1
3 really 7 6 6 3 10 6 10 10
4 hope 187 141 140 129 130 157 138 168
I thought the problem was because R sometimes considers the first column or row as factor. But as you can see :
> is.factor(tab[1,])
FALSE
It can fail even if it is not converted as factor by R.
I tried to tip "as.character(unlist()))" in my program, but in some other cases that I might encounter, it no longer works !... See an example with text and numbers in the first line :
> otherTab <- read.table(otherFile, header = FALSE, sep = "\t")
> otherTab
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 TEST this45 is 486text 725 with ca257 some numbers
2 number45 4 0 0 0 0 0 0 1
3 254every 7 6 6 3 10 6 10 10
4 where 187 141 140 129 130 157 138 168
> colnames(otherTab) <- as.character(unlist(otherTab[1,]))
> otherTab
6 10 9 7 725 8 9 8 9
1 TEST this45 is 486text 725 with ca257 some numbers
2 number45 4 0 0 0 0 0 0 1
3 254every 7 6 6 3 10 6 10 10
4 where 187 141 140 129 130 157 138 168
So how to handle these different cases in a easy way (because this seems to be a so simple problem) ? Many thanks in advance.
Upvotes: 3
Views: 952
Reputation: 21621
This happens because, in your initial data frame, V5
is a column of type "int" , not a factor (so you have two different types in your first row)
#> str(df)
#'data.frame': 4 obs. of 9 variables:
# $ V1: Factor w/ 4 levels "254every","TEST",..: 2 3 1 4
# $ V2: Factor w/ 4 levels "187","4","7",..: 4 2 3 1
# $ V3: Factor w/ 4 levels "0","141","6",..: 4 1 3 2
# $ V4: Factor w/ 4 levels "0","140","486text",..: 3 1 4 2
# $ V5: int 725 0 3 129
# $ V6: Factor w/ 4 levels "0","10","130",..: 4 1 2 3
# $ V7: Factor w/ 4 levels "0","157","6",..: 4 1 3 2
# $ V8: Factor w/ 4 levels "0","10","138",..: 4 1 2 3
# $ V9: Factor w/ 4 levels "1","10","168",..: 4 1 2 3
All elements of a vector must be of the same type. When you try to unlist()
and store the value in a vector to pass to colnames()
, you actually pass a "int" vector (because R coerces the elements to a common type):
#> str(unlist(df[1,]))
# Named int [1:9] 2 4 4 3 725 4 4 4 4
# - attr(*, "names")= chr [1:9] "V1" "V2" "V3" "V4" ...
If you modify the structure of your data frame to specify that column V5
is a factor, your initial method would work:
df[,5] <- as.factor(df[,5])
colnames(df) <- unlist(df[1,])
You would get:
#> df
# TEST this45 is 486text 725 with ca257 some numbers
#1 TEST this45 is 486text 725 with ca257 some numbers
#2 number45 4 0 0 0 0 0 0 1
#3 254every 7 6 6 3 10 6 10 10
#4 where 187 141 140 129 130 157 138 168
If you don't want to modify your column types, you could apply as.character()
to each element of the first row before coercing to a vector and passing to colnames()
:
colnames(df) <- lapply(df[1,], as.character)
Which results:
#> df
# TEST this45 is 486text 725 with ca257 some numbers
#1 TEST this45 is 486text 725 with ca257 some numbers
#2 number45 4 0 0 0 0 0 0 1
#3 254every 7 6 6 3 10 6 10 10
#4 where 187 141 140 129 130 157 138 168
Data
structure(list(V1 = structure(c(2L, 3L, 1L, 4L), .Label = c("254every",
"TEST", "number45", "where"), class = "factor"), V2 = structure(c(4L,
2L, 3L, 1L), .Label = c("187", "4", "7", "this45"), class = "factor"),
V3 = structure(c(4L, 1L, 3L, 2L), .Label = c("0", "141",
"6", "is"), class = "factor"), V4 = structure(c(3L, 1L, 4L,
2L), .Label = c("0", "140", "486text", "6"), class = "factor"),
V5 = c(725L, 0L, 3L, 129L), V6 = structure(c(4L, 1L, 2L,
3L), .Label = c("0", "10", "130", "with"), class = "factor"),
V7 = structure(c(4L, 1L, 3L, 2L), .Label = c("0", "157",
"6", "ca257"), class = "factor"), V8 = structure(c(4L, 1L,
2L, 3L), .Label = c("0", "10", "138", "some"), class = "factor"),
V9 = structure(c(4L, 1L, 2L, 3L), .Label = c("1", "10", "168",
"numbers"), class = "factor")), .Names = c("V1", "V2", "V3",
"V4", "V5", "V6", "V7", "V8", "V9"), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Upvotes: 5