Reputation: 61
If I have a dataframe as follows, with a combination of text values and NA cells:
id | Col1 | Col2 | Col3 | Col4 | Col5 | Col... |
---|---|---|---|---|---|---|
id1 | NA | NA | sample | NA | weight | etc |
id2 | NA | size | NA | NA | NA | etc |
id3 | volume | size | sample | NA | NA | etc |
id4 | NA | NA | NA | qty | NA | etc |
id5 | NA | NA | sample | qty | weight | etc |
Is it possible to rename the header with the most frequently used value in that column, as follows?
id | volume | size | sample | qty | weight |
---|---|---|---|---|---|
id1 | NA | NA | sample | NA | weight |
id2 | NA | size | NA | NA | NA |
id3 | volume | size | sample | NA | NA |
id4 | NA | NA | NA | qty | NA |
id5 | NA | NA | sample | qty | weight |
Upvotes: 0
Views: 127
Reputation: 389325
You can get the most frequent value in each column using Mode
function from here.
Mode <- function(x) {
ux <- unique(na.omit(x))
ux[which.max(tabulate(match(x, ux)))]
}
Apply it for each column and change the column name.
names(df)[-1] <- sapply(df[-1], Mode)
df
# id volume size sample qty weight
#1 id1 <NA> <NA> sample <NA> weight
#2 id2 <NA> size <NA> <NA> <NA>
#3 id3 volume size sample <NA> <NA>
#4 id4 <NA> <NA> <NA> qty <NA>
#5 id5 <NA> <NA> sample qty weight
Upvotes: 1
Reputation: 102880
Try the code below
> cbind(df[1], setNames(df[-1], sapply(df[-1], function(x) unique(na.omit(x)))))
id volume size sample qty weight
1 id1 <NA> <NA> sample <NA> weight
2 id2 <NA> size <NA> <NA> <NA>
3 id3 volume size sample <NA> <NA>
4 id4 <NA> <NA> <NA> qty <NA>
5 id5 <NA> <NA> sample qty weight
Data
> dput(df)
structure(list(id = c("id1", "id2", "id3", "id4", "id5"), Col1 = c(NA,
NA, "volume", NA, NA), Col2 = c(NA, "size", "size", NA, NA),
Col3 = c("sample", NA, "sample", NA, "sample"), Col4 = c(NA,
NA, NA, "qty", "qty"), Col5 = c("weight", NA, NA, NA, "weight"
)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1