n8sty
n8sty

Reputation: 1438

R for loop: create a new column with the count of a sub str from a different column

I used to fiddle with R and now it all seems to have escaped me . . .

I have a table with a few hundred columns and about 100k rows. One of those columns contains strings that sometimes have commas in them (e.g. chicken,goat,cow or just chicken). I need a script with a (I believe) for loop that can create a new column (I know the new column code should not be in the for loop), count the number of commas (or the number of entries in the column in question less one) and add one so I can find out how many entries are in each column. An example:

col
chicken
chicken,goat
cow,chicken,goat
cow

I want a script to turn create an additional column in the table that would look like . . .

col2
1
2
3
1

Upvotes: 1

Views: 496

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

I would use count.fields (from base R):

mydf$col2 <- count.fields(file = textConnection(as.character(mydf$col)), 
                          sep = ",")
mydf
#                col col2
# 1          chicken    1
# 2     chicken,goat    2
# 3 cow,chicken,goat    3
# 4              cow    1

Update: Accounting for blank lines

count.fields has a logical argument blank.lines.skip. So, to capture information for empty lines, just set that to TRUE.

Example:

mydf <- data.frame(col = c("chicken", "", "chicken,goat", "cow,chicken,goat", "cow"))

count.fields(file = textConnection(as.character(mydf$col)), 
             sep = ",", blank.lines.skip=FALSE)
# [1] 1 0 2 3 1

Upvotes: 7

sgibb
sgibb

Reputation: 25726

You could use ?strsplit:

df <- data.frame(col=c("chicken", "chicken,goat", "cow,chicken,goat", "cow"), stringsAsFactors=FALSE)
df$col2 <- sapply(strsplit(df$col, ","), length)
df
#                col col2
# 1          chicken    1
# 2     chicken,goat    2
# 3 cow,chicken,goat    3
# 4              cow    1

Upvotes: 0

Frank
Frank

Reputation: 66819

A loop is not needed here, I think. Using the stringr package...

require(stringr)
dat$aninum <- sapply(dat$ani,str_count,pattern=',')+1

which gives

               ani aninum
1          chicken      1
2     chicken,goat      2
3 cow,chicken,goat      3
4              cow      1

Upvotes: 2

Related Questions