Reputation: 743
i have a csv file imported as data.frame, the problem is that each row must have 4 elements (4 columns), and some of them could have different number of elements, i mean something like:
ID col1 col2 col3 col4
id1 dA dB dC dD
id2 aA aB aC aD
id3 mA mB mC
id4 xA xB xC XD
I'm using tidyr, and when I import the data it fill each missing element with NA, in this case the id3 at the col4.
id3 mA mB mC NA
I want to fix all the row that have less than 4 elements in each row (like id3), just to add in the missing element a unclassified (UNC) something like:
ID col1 col2 col3 col4
id1 dA dB dC dD
id2 aA aB aC aD
id3 mA mB mC UNC
id4 xA xB xC XD
Well this is my code:
df <- read.csv("file.csv", comment.char = "#", header = TRUE, sep = "\t")
#add the id as row name:
rownames(df) <- paste("id", 1:nrow(df), sep = "")
# eliminate some elements of the data frame
df[, 2:ncol(df)] <- NULL
# add a name of each column and split elements based in ";" character
#at this point the "df" has a single column named "old_name":
df <- df %>% tidyr::separate(old_name, c("col1", "col2", "col3", "col4"), sep = ";", extra="drop")
any suggestion !!!
thanks so much
Upvotes: 2
Views: 89
Reputation: 886928
We can use
library(dplyr)
df1 %>%
mutate_if(is.character, ~ replace(., is.na(.), "UNC"))
Or in base R
i1 <- sapply(df1, is.character)
df1[i1][is.na(df1[i1])] <- "UNC"
Upvotes: 2