Peter Chung
Peter Chung

Reputation: 1122

R add strings in dataframe

I have a data frame with different kind of strings. I would like to duplicate the strings within itself, and keep the NA value and two digit strings remain NA and two digits respectively.

 DF:
    Milk     Cola   Juice   Coffee  Tea Wine
1   A        NA     A       BD     C    A
2   AB       NA     C       D      CD   AD
3   A        BC     AC      D      D    D
4   AB       B      NA      D      CD   AD
5   B        C      AC      BD     CD   NA
6   AB       BC     C       NA     NA   A
7   NA       BC     A       B      NA   A 

 Desired output:
    Milk     Cola   Juice   Coffee  Tea Wine
1   AA       NA     AA      BD     CC   AA
2   AB       NA     CC      DD     CD   AD
3   AA       BC     AC      DD     DD   DD
4   AB       BB     NA      DD     CD   AD
5   BB       CC     AC      BD     CD   NA
6   AB       BC     CC      NA     NA   AA
7   NA       BC     AA      BB     NA   AA

Thank you.

Upvotes: 0

Views: 489

Answers (3)

akrun
akrun

Reputation: 886948

We can also do this using strrep which should be faster as it is written in C

DF[] <- lapply(DF, function(x) ifelse(nchar(x)==1, strrep(x,2), x))
DF
#  Milk Cola Juice Coffee  Tea Wine
#1   AA <NA>    AA     BD   CC   AA
#2   AB <NA>    CC     DD   CD   AD
#3   AA   BC    AC     DD   DD   DD
#4   AB   BB  <NA>     DD   CD   AD
#5   BB   CC    AC     BD   CD <NA>
#6   AB   BC    CC   <NA> <NA>   AA
#7 <NA>   BC    AA     BB <NA>   AA

An option using dplyr would be

library(dplyr)
DF %>%
   mutate_each(funs(ifelse(nchar(.)==1, strrep(., 2), .)))

Upvotes: 2

thelatemail
thelatemail

Reputation: 93813

Here's an attempt using a regular expression replacement:

dat[] <- lapply(dat, function(x) sub("^(.)$", paste(rep("\\1",2),collapse=""), x) )

Or less programmatically, but with the same result:

dat[] <- lapply(dat, function(x) sub("^(.)$", "\\1\\1", x) )

Or if you're really going to squash code, then:

dat[] <- lapply(dat, sub, pa="^(.)$", re="\\1\\1")

Where dat was:

structure(list(Milk = c("A", "AB", "A", "AB", "B", "AB", NA), 
    Cola = c(NA, NA, "BC", "B", "C", "BC", "BC"), Juice = c("A", 
    "C", "AC", NA, "AC", "C", "A"), Coffee = c("BD", "D", "D", 
    "D", "BD", NA, "B"), Tea = c("C", "CD", "D", "CD", "CD", 
    NA, NA), Wine = c("A", "AD", "D", "AD", NA, "A", "A")), .Names = c("Milk", 
"Cola", "Juice", "Coffee", "Tea", "Wine"), row.names = c("1", 
"2", "3", "4", "5", "6", "7"), class = "data.frame")

Upvotes: 4

pe-perry
pe-perry

Reputation: 2621

DF <- "    Milk     Cola   Juice   Coffee  Tea Wine
1   A        NA     A       BD     C    A
2   AB       NA     C       D      CD   AD
3   A        BC     AC      D      D    D
4   AB       B      NA      D      CD   AD
5   B        C      AC      BD     CD   NA
6   AB       BC     C       NA     NA   A
7   NA       BC     A       B      NA   A "
DF <- read.table(text=DF, stringsAsFactors=FALSE)

This is DF:

  Milk Cola Juice Coffee  Tea Wine
1    A <NA>     A     BD    C    A
2   AB <NA>     C      D   CD   AD
3    A   BC    AC      D    D    D
4   AB    B  <NA>      D   CD   AD
5    B    C    AC     BD   CD <NA>
6   AB   BC     C   <NA> <NA>    A
7 <NA>   BC     A      B <NA>    A

To achieve your goal, we can make use of lapply and ifelse.

DF[] <- lapply(DF, function(x) ifelse(nchar(x) == 1, paste(x, x, sep=""), x))

For each column, if the number character in the entry is 1, we duplicate it; otherwise, keep it as original.

Final output:

> DF
  Milk Cola Juice Coffee  Tea Wine
1   AA <NA>    AA     BD   CC   AA
2   AB <NA>    CC     DD   CD   AD
3   AA   BC    AC     DD   DD   DD
4   AB   BB  <NA>     DD   CD   AD
5   BB   CC    AC     BD   CD <NA>
6   AB   BC    CC   <NA> <NA>   AA
7 <NA>   BC    AA     BB <NA>   AA

Upvotes: 4

Related Questions