Alex
Alex

Reputation: 163

How to group the data frame in R so that resulting table counts the occurrence of positive and negative value of a column

I am using this data set (csv file). It is stored in R as a data frame imf.inflation. Now, my grouping command is:

imf.inflation %>% group_by(country_iso_code) %>% summarise(num_yrs=n())

And produces this table:

# A tibble: 194 x 2
   country_iso_code num_yrs
   <chr>              <int>
 1 ABW                   25
 2 AFG                   18
 3 AGO                   41
 4 ALB                   31
 5 ARE                   41
 6 ARG                   20
 7 ARM                   28
 8 ATG                   41
 9 AUS                   41
10 AUT                   41
# ... with 184 more rows

I'd like to extend this grouping command so that the resulting table has additional two columns: (1) one for the count of the occurrence of positive numbers in the {value} column, (2) another for the count of the occurrence of negative numbers. Thanks!

Upvotes: 1

Views: 145

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

In base R, you can use table -

table(imf.inflation$country_iso_code, sign(imf.inflation$value))

#      -1  0  1
#  ABW  5  0 20
#  AFG  2  0 16
#  AGO  0  0 41
#  ALB  1  0 30
#  ARE  2  0 39
#  ARG  3  0 17
#...
#...

where -1 is for negative numbers, 1 for positive numbers and 0 for 0 values.


A tidyverse way would be to use count and pivot_wider but here 0 values are counted under negative.

imf.inflation %>% 
  count(country_iso_code, sign = ifelse(sign(value) == 1, 'posiitve', 'negative')) %>%
  pivot_wider(names_from = sign, values_from = n, values_fill = 0)

#  country_iso_code negative posiitve
#   <chr>               <int>    <int>
# 1 ABW                     5       20
# 2 AFG                     2       16
# 3 AGO                     0       41
# 4 ALB                     1       30
# 5 ARE                     2       39
# 6 ARG                     3       17
# 7 ARM                     2       26
# 8 ATG                     3       38
# 9 AUS                     0       41
#10 AUT                     0       41
# … with 184 more rows

Upvotes: 2

akrun
akrun

Reputation: 887108

Create a logical column and get the sum of it

library(dplyr)
imf.inflation  %>% 
       group_by(country_iso_code) %>%
       summarise(num_yrs = n(), 
                 num_pos = sum(value > 0, na.rm = TRUE), 
                 num_neg = sum(value < 0, na.rm = TRUE))

-output

# A tibble: 194 x 4
   country_iso_code num_yrs num_pos num_neg
   <chr>              <int>   <int>   <int>
 1 ABW                   25      20       5
 2 AFG                   18      16       2
 3 AGO                   41      41       0
 4 ALB                   31      30       1
 5 ARE                   41      39       2
 6 ARG                   20      17       3
 7 ARM                   28      26       2
 8 ATG                   41      38       3
 9 AUS                   41      41       0
10 AUT                   41      41       0
# … with 184 more rows

NOTE: In most cases num_neg = num_yrs - num_pos, but in case if there are 0 values, we may need to take care of that (it depends on whether the OP include 0 in positive or not)

Upvotes: 3

Related Questions