user3388984
user3388984

Reputation: 13

How do you create a new column containing percentage data calculated from other columns?

Please excuse the very novice question, but I'm trying to create a new column in a data frame that contains percentages based on other columns. For example, the data I'm working with is similar to the following, where the That column is a binary factor (i.e. presence or absence of "that"), the Verb column is the individual verb (i.e. verbs that may or may not be following by "that"), and the Freq column indicates the frequency of each individual verb.

     That    Verb Freq
1    That believe    3
2  NoThat   think    4
3    That     say    3
4    That believe    3
5    That   think    4
6  NoThat     say    3
7  NoThat believe    3
8  NoThat   think    4
9    That     say    3
10 NoThat   think    4

What I want is to add another column that provides the overall rate of "that" expression (coded as "That") for each of the different verbs. Something like the following:

     That    Verb Freq Perc.That
1    That believe    3      33.3
2  NoThat   think    4      25.0
3    That     say    3      33.3
4    That believe    3      33.3
5    That   think    4      25.0
6  NoThat     say    3      33.3
7  NoThat believe    3      33.3
8  NoThat   think    4      25.0
9    That     say    3      33.3
10 NoThat   think    4      25.0

It may be that I've missed a similar question elsewhere. If so, my apologize. Nevertheless, thanks in advance for any help.

Upvotes: 1

Views: 1552

Answers (1)

maloneypatr
maloneypatr

Reputation: 3622

You want to use the ddply function in the plyr library:

#install.packages('plyr')
library(plyr)

dat # your data frame

ddply(dat, .(verb), transform, perc.that = freq/sum(freq))

#     that    verb freq perc.that
#1    That believe    3 0.3333333
#2    That believe    3 0.3333333
#3  NoThat believe    3 0.3333333
#4    That     say    3 0.3333333
#...

Upvotes: 1

Related Questions