Assess each row of a factor in R

Question

I have a factor with 1000 rows and 848 levels (i.e. some rows are empty). For each row, I want to count the number of elements (i.e., one element = 1, 2 elements = 2, empty row = 0, etc.). A simpler way to describe it is: I want to convert a factor into a data.frame, but I want to change the data type from factor to numeric and keep the values in each row.

v.m.two <- Output[,1]
v.m.two <- data.frame(v.m.two)
class(v.m.two)
[1] data.frame
class(v.m.two[1,]
[1] factor
dim(v.m.two)
[1] 1000 1
v.m.two[1,]
[1] 848 Levels: 0 1000 1002, 4875, 4082, 1952 1015, 2570, 3524 1017 1020, 1576 ... 983, 4381,
2256, 4361, 4271

Any suggestions?

           v.m.two
1       2633, 4868
2        126, 4860
3                0
4        122, 4762
5             4256
6 2933, 2892, 2389

Basically, I want to count the values in each row (e.g., row 1 is 2, row 2 is 2, row 3 is 0, etc.).

akrun · Accepted Answer

1 Converting factor to numeric

If you want to convert the factor columns to numeric and want to have separate columns based on the number of elements in each row.

 library(splitstackshape)
 res <- cSplit(v.m.two, 'v.m.two', sep=",")
 res
 #    v.m.two_1 v.m.two_2 v.m.two_3
 #1:      2633      4868        NA
 #2:       126      4860        NA
 #3:         0        NA        NA
 #4:       122      4762        NA
 #5:      4256        NA        NA
 #6:      2933      2892      2389

  str(res)
  #Classes ‘data.table’ and 'data.frame':   6 obs. of  3 variables:
  #$ v.m.two_1: int  2633 126 0 122 4256 2933
  # $ v.m.two_2: int  4868 4860 NA 4762 NA 2892
  #$ v.m.two_3: int  NA NA NA NA NA 2389

If you need a vector, you could use stri_split from stringi

  library(stringi)
  as.numeric(unlist(stri_split(v.m.two[,1], regex=",")))
  #[1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389

2. Counting values in row

For counting the values in each row of v.m.two, you could either count from the res above or from v.m.two. In the first option, we are counting the number of NAs in each row of res and then multiplying with the logical index derived from whether the first column of v.m.two is 0 or not. The TRUE values i.e. !=0 will get the count while the FALSE will coerce to 0 ie. 0 * value=0
```
  (v.m.two[,1]!=0)*(rowSums(!is.na(res)))
  #[1] 2 2 0 2 1 3    
```
You could use stri_count from stringi which would be fast (counting occurrence of particular letter in vector of words in r). Here as above, you can either use the arithmetic i.e. multiplying or could use ifelse. The regex can be based on digits or ,. If you are using ,, then make sure to add 1.
```
  ifelse(v.m.two[,1]=0, stri_count(v.m.two[,1], regex="\d+"), 0)
  # [1] 2 2 0 2 1 3
  #Or

  (v.m.two[,1]!=0) *stri_count(v.m.two[,1], regex="\d+")
  #[1] 2 2 0 2 1 3
  #Or   
  (v.m.two[,1]!=0) *(stri_count(v.m.two[,1], regex=",") +1)
  #[1] 2 2 0 2 1 3
```

Another option to count would be to use gsub and nchar from base R.

  (v.m.two[,1]!=0) *( nchar(gsub("[^,]", "", v.m.two[,1]))+1)
  #[1] 2 2 0 2 1 3

data

v.m.two <- structure(list(v.m.two = structure(c(4L, 3L, 1L, 2L, 6L, 5L), 
.Label = c("0", "122, 4762", "126, 4860", "2633, 4868", "2933, 2892, 2389",
 "4256"), class = "factor")), .Names = "v.m.two", row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

Assess each row of a factor in R

Answers (2)

data

Related Questions