user2105555
user2105555

Reputation:

Assess each row of a factor in R

I have a factor with 1000 rows and 848 levels (i.e. some rows are empty). For each row, I want to count the number of elements (i.e., one element = 1, 2 elements = 2, empty row = 0, etc.). A simpler way to describe it is: I want to convert a factor into a data.frame, but I want to change the data type from factor to numeric and keep the values in each row.

v.m.two <- Output[,1]
v.m.two <- data.frame(v.m.two)
class(v.m.two)
[1] data.frame
class(v.m.two[1,]
[1] factor
dim(v.m.two)
[1] 1000 1
v.m.two[1,]
[1] 848 Levels: 0 1000 1002, 4875, 4082, 1952 1015, 2570, 3524 1017 1020, 1576 ... 983, 4381,
2256, 4361, 4271

Any suggestions?

           v.m.two
1       2633, 4868
2        126, 4860
3                0
4        122, 4762
5             4256
6 2933, 2892, 2389

Basically, I want to count the values in each row (e.g., row 1 is 2, row 2 is 2, row 3 is 0, etc.).

Upvotes: 0

Views: 216

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99371

You have erroneous commas which is causing the factors. Try scan

scan(text=with(v.m.two, levels(v.m.two)[v.m.two]), sep=",", what=integer())
# Read 11 items
# [1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389

And to count the lengths and convert to numeric, you can also use strsplit

s <- strsplit(as.character(v.m.two[[1]]), ", ")
vapply(s, length, integer(1L)) ## row 3 is actually 1 if there's a zero there
# [1] 2 2 1 2 1 3
as.numeric(do.call(c, s))
# [1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389

Upvotes: 1

akrun
akrun

Reputation: 887901

1 Converting factor to numeric

  • If you want to convert the factor columns to numeric and want to have separate columns based on the number of elements in each row.

     library(splitstackshape)
     res <- cSplit(v.m.two, 'v.m.two', sep=",")
     res
     #    v.m.two_1 v.m.two_2 v.m.two_3
     #1:      2633      4868        NA
     #2:       126      4860        NA
     #3:         0        NA        NA
     #4:       122      4762        NA
     #5:      4256        NA        NA
     #6:      2933      2892      2389
    
      str(res)
      #Classes ‘data.table’ and 'data.frame':   6 obs. of  3 variables:
      #$ v.m.two_1: int  2633 126 0 122 4256 2933
      # $ v.m.two_2: int  4868 4860 NA 4762 NA 2892
      #$ v.m.two_3: int  NA NA NA NA NA 2389
    
  • If you need a vector, you could use stri_split from stringi

      library(stringi)
      as.numeric(unlist(stri_split(v.m.two[,1], regex=",")))
      #[1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389
    

2. Counting values in row

  • For counting the values in each row of v.m.two, you could either count from the res above or from v.m.two. In the first option, we are counting the number of NAs in each row of res and then multiplying with the logical index derived from whether the first column of v.m.two is 0 or not. The TRUE values i.e. !=0 will get the count while the FALSE will coerce to 0 ie. 0 * value=0

      (v.m.two[,1]!=0)*(rowSums(!is.na(res)))
      #[1] 2 2 0 2 1 3    
    
  • You could use stri_count from stringi which would be fast (counting occurrence of particular letter in vector of words in r). Here as above, you can either use the arithmetic i.e. multiplying or could use ifelse. The regex can be based on digits or ,. If you are using ,, then make sure to add 1.

      ifelse(v.m.two[,1]=0, stri_count(v.m.two[,1], regex="\\d+"), 0)
      # [1] 2 2 0 2 1 3
      #Or
    
      (v.m.two[,1]!=0) *stri_count(v.m.two[,1], regex="\\d+")
      #[1] 2 2 0 2 1 3
      #Or   
      (v.m.two[,1]!=0) *(stri_count(v.m.two[,1], regex=",") +1)
      #[1] 2 2 0 2 1 3
    
  • Another option to count would be to use gsub and nchar from base R.

      (v.m.two[,1]!=0) *( nchar(gsub("[^,]", "", v.m.two[,1]))+1)
      #[1] 2 2 0 2 1 3
    

data

v.m.two <- structure(list(v.m.two = structure(c(4L, 3L, 1L, 2L, 6L, 5L), 
.Label = c("0", "122, 4762", "126, 4860", "2633, 4868", "2933, 2892, 2389",
 "4256"), class = "factor")), .Names = "v.m.two", row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

Upvotes: 0

Related Questions