Reputation: 19
My data set is attached. I would like to create a new column that has the length of each unique quarter (QUART) identifier. In other words, for each row I want to create a new value that has the number of times its corresponding QUART appears in the data set
So row 1 should have a new column with the value "4" because 1992.2 occurs 4 times.
My data structure looks like"
ID QUART Trasaction New Column (I want)
1 1992.2 Company 1 4
2 1992.2 Company 2 4
3 1992.2 Company 3 4
4 1992.2 Company 4 4
5 1992.3 Company 5 1
6 1992.4 Company 6 1
7 1993.1 Company 7 1
Thanks
Upvotes: 0
Views: 39
Reputation: 20085
One option in base-R can be achieved using mapply
as:
df$count <- mapply(function(x)sum(df$QUART == x), df$QUART)
# ID QUART Trasaction count
# 1 1 1992.2 Company 1 4
# 2 2 1992.2 Company 2 4
# 3 3 1992.2 Company 3 4
# 4 4 1992.2 Company 4 4
# 5 5 1992.3 Company 5 1
# 6 6 1992.4 Company 6 1
# 7 7 1993.1 Company 7 1
Note: Since QUART
is of type numeric
/ double
. Hence my suggestion is to instead of comparing with ==
difference of two values should be compared against double precision
limit of hardware. To take care of those situation a solution could be as
mapply(function(x)sum(abs(df$QUART - x) <= 0.000001), df$QUART)
Data
df <- read.table(text = "
ID QUART Trasaction
1 1992.2 'Company 1'
2 1992.2 'Company 2'
3 1992.2 'Company 3'
4 1992.2 'Company 4'
5 1992.3 'Company 5'
6 1992.4 'Company 6'
7 1993.1 'Company 7'",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 0
Reputation: 50668
You can use dplyr::group_by
with n()
to count the number of identical entries per QUART
:
library(tidyverse);
df %>%
group_by(QUART) %>%
mutate(count = n());
## A tibble: 7 x 4
## Groups: QUART [4]
# ID QUART Trasaction count
# <int> <dbl> <fct> <int>
#1 1 1992. Company 1 4
#2 2 1992. Company 2 4
#3 3 1992. Company 3 4
#4 4 1992. Company 4 4
#5 5 1992. Company 5 1
#6 6 1992. Company 6 1
#7 7 1993. Company 7 1
df <- read.table(text =
"ID QUART Trasaction
1 1992.2 'Company 1'
2 1992.2 'Company 2'
3 1992.2 'Company 3'
4 1992.2 'Company 4'
5 1992.3 'Company 5'
6 1992.4 'Company 6'
7 1993.1 'Company 7'", header = T)
Upvotes: 1