Reputation: 1143

How to combine multiple character columns into a single column in an R data frame

I am working with Census data and I need to combine four character columns into a single column.

Example:

LOGRECNO STATE COUNTY  TRACT BLOCK
    60    01    001  021100  1053
    61    01    001  021100  1054
    62    01    001  021100  1055
    63    01    001  021100  1056
    64    01    001  021100  1057
    65    01    001  021100  1058

I want to create a new column that adds the strings of STATE, COUNTY, TRACT, and BLOCK together into a single string. Example:

LOGRECNO STATE COUNTY  TRACT BLOCK  BLOCKID
    60    01    001  021100  1053   01001021101053
    61    01    001  021100  1054   01001021101054
    62    01    001  021100  1055   01001021101055
    63    01    001  021100  1056   01001021101056
    64    01    001  021100  1057   01001021101057
    65    01    001  021100  1058   01001021101058

I've tried:

AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT,    AL_Blocks$BLOCK), collapse = "")

But this combines all rows of all four columns into a single string.

Upvotes: 31

Answers (7)

John Karuitha

Reputation: 369

The new kid on the block is the glue package:

library(glue)

my_data %>%

    glue::glue("{STATE}{COUNTY}{TRACT}{BLOCK}")

Upvotes: 2

user11300405

Reputation: 1

You can both WRITE and READ Text files with any specified "string-separator", not necessarily a character separator. This is very useful in many cases when the data has practically all terminal symbols, and thus, no 1 symbol can be used as a separator. Here are examples of read and write functions:

WRITE OUT Special Separator Text:

writeSepText <- function(df, fileName, separator) {
    con <- file(fileName)
    data <- apply(df, 1, paste, collapse = separator)
    # data
    data <- writeLines(data, con)
    close(con)
    return
}

Test Writing out text file separated by a string "bra_break_ket"

writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")

READ In text files with special separator string

readSepText <- function(fileName, separator) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separator)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}

Test Reading in text file separated by

df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df

Upvotes: 0

Sophia J

Reputation: 105

You can use tidyverse package:

DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)

Upvotes: 4

Kou

Reputation: 175

You can try this too

AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
                       TRACT, BLOCK, sep = "")

Upvotes: 6

A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

You can use do.call and paste0. Try:

AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])

Example output:

do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"

You can also use unite from "tidyr", like this:

library(tidyr)
library(dplyr)
AL_Blocks %>% 
  unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
#   LOGRECNO        BLOCK_ID STATE COUNTY  TRACT BLOCK
# 1       60 010010211001053    01    001 021100  1053
# 2       61 010010211001054    01    001 021100  1054
# 3       62 010010211001055    01    001 021100  1055
# 4       63 010010211001056    01    001 021100  1056
# 5       64 010010211001057    01    001 021100  1057
# 6       65 010010211001058    01    001 021100  1058

where "AL_Blocks" is provided as:

AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"), 
    STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001", 
    "001", "001", "001", "001"), TRACT = c("021100", "021100", "021100", 
    "021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
    "1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT", 
    "BLOCK"), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 22

Freda K

Reputation: 464

Or try this

DF$BLOCKID <-
  paste(DF$LOGRECNO, DF$STATE, DF$COUNTY, 
        DF$TRACT, DF$BLOCK, sep = "")

(Here is a method to set up the dataframe for people coming into this discussion later)

DF <- 
  data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
             STATE = c(1, 1, 1, 1, 1, 1),
             COUNTY = c(1, 1, 1, 1, 1, 1), 
             TRACT = c(21100, 21100, 21100, 21100, 21100, 21100), 
             BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))

Upvotes: 5

JAponte

Reputation: 1538

Try this:

AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))

there was a typo in County... it should've been COUNTY. Also, you don't need the collapse parameter.

I hope that helps.

Upvotes: 30