googleplex101
googleplex101

Reputation: 195

How to create an output with the number of observations in each row?

So I have this data which is very messy and in particular one column is causing me huge problems. Basically instead of having one observation per row, some of the rows have two or more observations, usually separated by a comma.

My challenge is to separate out the different observations into different columns but the separate function in the tidyr package doesnt work because some rows have more observations than others.

So I figure I need something that tells me how many comma separated observations I am dealing with in each row. Then I can maybe move forward with separating them all out. But I have no idea what that something might be.

Any help you can give me would be much appreciated.

Thanks in advance

Upvotes: 1

Views: 148

Answers (3)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

You can use regex if you to know the number of comma:

len = sapply(gregexpr(",",df$x), length) + 1
len
#[1] 3 2 4

And then use separate from tidyr by specifying extra option:

library(tidyr)
separate(df, x, sep=',', into=paste0('col',1:max(len)), extra = "merge")
#  col1 col2 col3 col4
#1   a1   a2   a3 <NA>
#2   b1   b2 <NA> <NA>
#3   c1   c2   c3   c4

Note than behind separate there is no more than strsplit function!

Data:

df = data.frame(x = c("a1,a2,a3", "b1,b2", "c1,c2,c3,c4"))

Upvotes: 1

akrun
akrun

Reputation: 887118

You could try cSplit ("df" from @Colonel Beauvel's post)

library(splitstackshape)
cSplit(df, 'x', sep=',')
#  x_1 x_2 x_3 x_4
#1:  a1  a2  a3  NA
#2:  b1  b2  NA  NA
#3:  c1  c2  c3  c4

Upvotes: 2

Peter Diakumis
Peter Diakumis

Reputation: 4042

Perhaps you can use strsplit to split your comma-separated variable into a list of strings and then use the sapply function to count the length of each list element:

> x <- c("a1,a2,a3", "b1,b2", "c1,c2,c3,c4")
> (tmp <- strsplit(x, ","))
[[1]]
[1] "a1" "a2" "a3"

[[2]]
[1] "b1" "b2"

[[3]]
[1] "c1" "c2" "c3" "c4"

> sapply(tmp, length)
[1] 3 2 4

Upvotes: 1

Related Questions