kfeye
kfeye

Reputation: 13

Questions about columns that have different data formats within the column in R

My data has a column that has both numerical values for percentages (e.g. 0.02 for 2%) and percentage values (5%). How do I convert that so that the formatting for the files are the same? Also, I have another column where the data format shifts from date to dashed code (e.g. 15-2-2005 is the numeric designation, but some values are formatted 2/15/2005). I want everything in the column in the 15-2-2005 format (it isn't a date, the numbers between the dashes vary, so the ones that look like dates, I want to turn into the dashed format...so where we have a /, I would like a -.

I guess I am asking how to coerce the data format into the same format throughout the data file? Is it an "If" statement?

Upvotes: 0

Views: 20

Answers (1)

G5W
G5W

Reputation: 37641

Since you don't give any sample data, I will make some up.

A = c(0.5, 0.7, "22%", 0.6, "13%")
B = c("15-2-2005", "16-3-2006", "17/4/2007", 
    "18/5/2008", "19-6-2009")

MyData = data.frame(A,B)
MyData

    A         B
1 0.5 15-2-2005
2 0.7 16-3-2006
3 22% 17/4/2007
4 0.6 18/5/2008
5 13% 19-6-2009

To fix up your data, just substitute "/" for "-" in the one column. In the other column, remove the "%" and then divide the ones that had "%" by 100.

## Easy one first
MyData$B = gsub("-", "/", MyData$B)

## Now the percents
Percents = grep("%", MyData$A)
MyData$A = as.numeric(gsub("%", "", MyData$A))
MyData$A[Percents] = MyData$A[Percents]/100

MyData
     A         B
1 0.50 15/2/2005
2 0.70 16/3/2006
3 0.22 17/4/2007
4 0.60 18/5/2008
5 0.13 19/6/2009

Upvotes: 1

Related Questions