Joshua Rosenberg
Joshua Rosenberg

Reputation: 4236

Converting character values to numeric values in R with a function

I'm looking to load and process a CSV file with seven variables, one which is a grouping variable / factor (data$hashtag) and six which are categories (data$support and others) denoted with either an "X" or "x" (or were left blank).

data <- read.csv("maet_coded_tweets.csv", stringsAsFactors = F)

names(data) <- c("hashtag", "support", "contributeConversation", "otherCommunities", "buildCommunity", "engageConversation", "unclear")

str(data)

'data.frame':   854 obs. of  7 variables:
 $ hashtag               : chr  "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
 $ support               : chr  "x" "x" "x" "x" ...
 $ contributeConversation: chr  "" "" "" "" ...
 $ otherCommunities      : chr  "" "" "" "" ...
 $ buildCommunity        : chr  "" "" "" "" ...
 $ engageConversation    : chr  "" "" "" "" ...
 $ unclear               : chr  "" "" "" "" ...

When I use a function to recode "X" or "x" to 1, and "" (blank) 0, the data are strangely character type, not numeric as intended.

recode <- function(x) {

  x[x=="x"] <- 1
  x[x=="X"] <- 1
  x[x==""] <- 0
  x
}

data[] <- lapply(data, recode)

str(data)

'data.frame':   854 obs. of  7 variables:
 $ hashtag               : chr  "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
 $ support               : chr  "1" "1" "1" "1" ...
 $ contributeConversation: chr  "0" "0" "0" "0" ...
 $ otherCommunities      : chr  "0" "0" "0" "0" ...
 $ buildCommunity        : chr  "0" "0" "0" "0" ...
 $ engageConversation    : chr  "0" "0" "0" "0" ...
 $ unclear               : chr  "0" "0" "0" "0" ...

When I tried to coerce the characters using as.numeric() in the function, it still didn't work. What gives - why would the variables be treated as characters and how to character variables to numeric?

Upvotes: 0

Views: 669

Answers (3)

narendra-choudhary
narendra-choudhary

Reputation: 4826

Using mapvalues from plyr.

data$support <- as.numeric(mapvalues(data$support, c("X", "x", ""), c(1, 1, 0)))

Using replace.

data$support <- replace(x <- data$support, x == "X", 1)
data$support <- replace(x <- data$support, x == "x", 1)
data$support <- replace(x <- data$support, x == "", 0)
data$support <- numeric(data$support)

Upvotes: 1

C8H10N4O2
C8H10N4O2

Reputation: 19025

How about:

recode <- function(x) {
  ifelse(x %in% c('X','x'), 1,0)
}

Explanation: the steps in the function are evaluated sequentially, not simultaneously. So when you partially assign 1's to a character vector, these get converted to "1"s.

Upvotes: 2

sam
sam

Reputation: 158

Waht about something like this?

# sample data with support being a character vector
data.frame(support = c("X","X","0","x","0"),a=1:5,stringsAsFactors = F)->myDat
# convert to a factor and check the order of the levels
myDat$support <- as.factor(myDat$support)
levels(myDat$support)
#"0" "x" "X"
# just to see that it worked make an additional variable
myDat$supportrecoded <- myDat$support
# change levels and convert
levels(myDat$supportrecoded) <- c("0","1","1")
myDat$supportrecoded <- as.integer(as.character(myDat$supportrecoded ))

Upvotes: 1

Related Questions