Reputation: 13
Please excuse the very novice question, but I'm trying to create a new column in a data frame that contains percentages based on other columns. For example, the data I'm working with is similar to the following, where the That column is a binary factor (i.e. presence or absence of "that"), the Verb column is the individual verb (i.e. verbs that may or may not be following by "that"), and the Freq column indicates the frequency of each individual verb.
That Verb Freq
1 That believe 3
2 NoThat think 4
3 That say 3
4 That believe 3
5 That think 4
6 NoThat say 3
7 NoThat believe 3
8 NoThat think 4
9 That say 3
10 NoThat think 4
What I want is to add another column that provides the overall rate of "that" expression (coded as "That") for each of the different verbs. Something like the following:
That Verb Freq Perc.That
1 That believe 3 33.3
2 NoThat think 4 25.0
3 That say 3 33.3
4 That believe 3 33.3
5 That think 4 25.0
6 NoThat say 3 33.3
7 NoThat believe 3 33.3
8 NoThat think 4 25.0
9 That say 3 33.3
10 NoThat think 4 25.0
It may be that I've missed a similar question elsewhere. If so, my apologize. Nevertheless, thanks in advance for any help.
Upvotes: 1
Views: 1552
Reputation: 3622
You want to use the ddply
function in the plyr
library:
#install.packages('plyr')
library(plyr)
dat # your data frame
ddply(dat, .(verb), transform, perc.that = freq/sum(freq))
# that verb freq perc.that
#1 That believe 3 0.3333333
#2 That believe 3 0.3333333
#3 NoThat believe 3 0.3333333
#4 That say 3 0.3333333
#...
Upvotes: 1