Reputation: 1438
I used to fiddle with R and now it all seems to have escaped me . . .
I have a table with a few hundred columns and about 100k rows. One of those columns contains strings that sometimes have commas in them (e.g. chicken,goat,cow or just chicken). I need a script with a (I believe) for loop that can create a new column (I know the new column code should not be in the for loop), count the number of commas (or the number of entries in the column in question less one) and add one so I can find out how many entries are in each column. An example:
col
chicken
chicken,goat
cow,chicken,goat
cow
I want a script to turn create an additional column in the table that would look like . . .
col2
1
2
3
1
Upvotes: 1
Views: 496
Reputation: 193507
I would use count.fields
(from base R):
mydf$col2 <- count.fields(file = textConnection(as.character(mydf$col)),
sep = ",")
mydf
# col col2
# 1 chicken 1
# 2 chicken,goat 2
# 3 cow,chicken,goat 3
# 4 cow 1
count.fields
has a logical argument blank.lines.skip
. So, to capture information for empty lines, just set that to TRUE
.
Example:
mydf <- data.frame(col = c("chicken", "", "chicken,goat", "cow,chicken,goat", "cow"))
count.fields(file = textConnection(as.character(mydf$col)),
sep = ",", blank.lines.skip=FALSE)
# [1] 1 0 2 3 1
Upvotes: 7
Reputation: 25726
You could use ?strsplit
:
df <- data.frame(col=c("chicken", "chicken,goat", "cow,chicken,goat", "cow"), stringsAsFactors=FALSE)
df$col2 <- sapply(strsplit(df$col, ","), length)
df
# col col2
# 1 chicken 1
# 2 chicken,goat 2
# 3 cow,chicken,goat 3
# 4 cow 1
Upvotes: 0
Reputation: 66819
A loop is not needed here, I think. Using the stringr
package...
require(stringr)
dat$aninum <- sapply(dat$ani,str_count,pattern=',')+1
which gives
ani aninum
1 chicken 1
2 chicken,goat 2
3 cow,chicken,goat 3
4 cow 1
Upvotes: 2