Reputation: 1122
I have a data frame with different kind of strings. I would like to duplicate the strings within itself, and keep the NA value and two digit strings remain NA and two digits respectively.
DF:
Milk Cola Juice Coffee Tea Wine
1 A NA A BD C A
2 AB NA C D CD AD
3 A BC AC D D D
4 AB B NA D CD AD
5 B C AC BD CD NA
6 AB BC C NA NA A
7 NA BC A B NA A
Desired output:
Milk Cola Juice Coffee Tea Wine
1 AA NA AA BD CC AA
2 AB NA CC DD CD AD
3 AA BC AC DD DD DD
4 AB BB NA DD CD AD
5 BB CC AC BD CD NA
6 AB BC CC NA NA AA
7 NA BC AA BB NA AA
Thank you.
Upvotes: 0
Views: 489
Reputation: 886948
We can also do this using strrep
which should be faster as it is written in C
DF[] <- lapply(DF, function(x) ifelse(nchar(x)==1, strrep(x,2), x))
DF
# Milk Cola Juice Coffee Tea Wine
#1 AA <NA> AA BD CC AA
#2 AB <NA> CC DD CD AD
#3 AA BC AC DD DD DD
#4 AB BB <NA> DD CD AD
#5 BB CC AC BD CD <NA>
#6 AB BC CC <NA> <NA> AA
#7 <NA> BC AA BB <NA> AA
An option using dplyr
would be
library(dplyr)
DF %>%
mutate_each(funs(ifelse(nchar(.)==1, strrep(., 2), .)))
Upvotes: 2
Reputation: 93813
Here's an attempt using a regular expression replacement:
dat[] <- lapply(dat, function(x) sub("^(.)$", paste(rep("\\1",2),collapse=""), x) )
Or less programmatically, but with the same result:
dat[] <- lapply(dat, function(x) sub("^(.)$", "\\1\\1", x) )
Or if you're really going to squash code, then:
dat[] <- lapply(dat, sub, pa="^(.)$", re="\\1\\1")
Where dat
was:
structure(list(Milk = c("A", "AB", "A", "AB", "B", "AB", NA),
Cola = c(NA, NA, "BC", "B", "C", "BC", "BC"), Juice = c("A",
"C", "AC", NA, "AC", "C", "A"), Coffee = c("BD", "D", "D",
"D", "BD", NA, "B"), Tea = c("C", "CD", "D", "CD", "CD",
NA, NA), Wine = c("A", "AD", "D", "AD", NA, "A", "A")), .Names = c("Milk",
"Cola", "Juice", "Coffee", "Tea", "Wine"), row.names = c("1",
"2", "3", "4", "5", "6", "7"), class = "data.frame")
Upvotes: 4
Reputation: 2621
DF <- " Milk Cola Juice Coffee Tea Wine
1 A NA A BD C A
2 AB NA C D CD AD
3 A BC AC D D D
4 AB B NA D CD AD
5 B C AC BD CD NA
6 AB BC C NA NA A
7 NA BC A B NA A "
DF <- read.table(text=DF, stringsAsFactors=FALSE)
This is DF
:
Milk Cola Juice Coffee Tea Wine
1 A <NA> A BD C A
2 AB <NA> C D CD AD
3 A BC AC D D D
4 AB B <NA> D CD AD
5 B C AC BD CD <NA>
6 AB BC C <NA> <NA> A
7 <NA> BC A B <NA> A
To achieve your goal, we can make use of lapply
and ifelse
.
DF[] <- lapply(DF, function(x) ifelse(nchar(x) == 1, paste(x, x, sep=""), x))
For each column, if the number character in the entry is 1, we duplicate it; otherwise, keep it as original.
Final output:
> DF
Milk Cola Juice Coffee Tea Wine
1 AA <NA> AA BD CC AA
2 AB <NA> CC DD CD AD
3 AA BC AC DD DD DD
4 AB BB <NA> DD CD AD
5 BB CC AC BD CD <NA>
6 AB BC CC <NA> <NA> AA
7 <NA> BC AA BB <NA> AA
Upvotes: 4