Reputation: 4236
I'm looking to load and process a CSV file with seven variables, one which is a grouping variable / factor (data$hashtag
) and six which are categories (data$support
and others) denoted with either an "X" or "x" (or were left blank).
data <- read.csv("maet_coded_tweets.csv", stringsAsFactors = F)
names(data) <- c("hashtag", "support", "contributeConversation", "otherCommunities", "buildCommunity", "engageConversation", "unclear")
str(data)
'data.frame': 854 obs. of 7 variables:
$ hashtag : chr "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
$ support : chr "x" "x" "x" "x" ...
$ contributeConversation: chr "" "" "" "" ...
$ otherCommunities : chr "" "" "" "" ...
$ buildCommunity : chr "" "" "" "" ...
$ engageConversation : chr "" "" "" "" ...
$ unclear : chr "" "" "" "" ...
When I use a function to recode "X" or "x" to 1, and "" (blank) 0, the data are strangely character type, not numeric as intended.
recode <- function(x) {
x[x=="x"] <- 1
x[x=="X"] <- 1
x[x==""] <- 0
x
}
data[] <- lapply(data, recode)
str(data)
'data.frame': 854 obs. of 7 variables:
$ hashtag : chr "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
$ support : chr "1" "1" "1" "1" ...
$ contributeConversation: chr "0" "0" "0" "0" ...
$ otherCommunities : chr "0" "0" "0" "0" ...
$ buildCommunity : chr "0" "0" "0" "0" ...
$ engageConversation : chr "0" "0" "0" "0" ...
$ unclear : chr "0" "0" "0" "0" ...
When I tried to coerce the characters using as.numeric()
in the function, it still didn't work. What gives - why would the variables be treated as characters and how to character variables to numeric?
Upvotes: 0
Views: 669
Reputation: 4826
Using mapvalues
from plyr
.
data$support <- as.numeric(mapvalues(data$support, c("X", "x", ""), c(1, 1, 0)))
Using replace
.
data$support <- replace(x <- data$support, x == "X", 1)
data$support <- replace(x <- data$support, x == "x", 1)
data$support <- replace(x <- data$support, x == "", 0)
data$support <- numeric(data$support)
Upvotes: 1
Reputation: 19025
How about:
recode <- function(x) {
ifelse(x %in% c('X','x'), 1,0)
}
Explanation: the steps in the function are evaluated sequentially, not simultaneously. So when you partially assign 1's to a character vector, these get converted to "1"s.
Upvotes: 2
Reputation: 158
Waht about something like this?
# sample data with support being a character vector
data.frame(support = c("X","X","0","x","0"),a=1:5,stringsAsFactors = F)->myDat
# convert to a factor and check the order of the levels
myDat$support <- as.factor(myDat$support)
levels(myDat$support)
#"0" "x" "X"
# just to see that it worked make an additional variable
myDat$supportrecoded <- myDat$support
# change levels and convert
levels(myDat$supportrecoded) <- c("0","1","1")
myDat$supportrecoded <- as.integer(as.character(myDat$supportrecoded ))
Upvotes: 1