Reputation: 61

Performing a computation on subset using R and dplyr

I have a df of 4 columns - c("Observation.ID", "Event.Type", "Property.Damage", "Magnitude").

Magnitude values signify whether property damage is given in thousands, millions, or billions of dollars ("K","M","B").

I want to normalize Property.Damage, so I need to separately compute for the 3 groups:

update df set Property.Damage=(Property.Damage*n) where Magnitude='K'

In dplyr, I understand how to split on class, add the recomputed property damage, combine, then summarize. Surely it's possible to do this more simply, a la SQL?

Edit: I went with data.table because it feels quick/easy compared to base. E.g.:

setkey(df1, Magnitude)
df1["K", PROPDMG := PROPDMG*1e3]
df1["M", PROPDMG := PROPDMG*1e6]
df1["B", PROPDMG := PROPDMG*1e7]

Upvotes: 0

Answers (3)

Q-Dawg

Reputation: 61

I found that data.table was the most appealing approach. In fact, this has switched me from dplyr to data.table for split/apply/combine. Although it appears base R makes for the fewest keystrokes - I find data.table's i,j, := is less wonky parenthetically.

setkey(df1, Magnitude)
df1["K", PROPDMG := PROPDMG*1e3]
df1["M", PROPDMG := PROPDMG*1e6]
df1["B", PROPDMG := PROPDMG*1e7]

Alternatively, we can create another data.table as follows:

df2 = data.table(Magnitude = c("K", "M", "B"), mult = c(1e3, 1e6, 1e9))

and then perform an update while joining as follows:

df1[df2, PROPDMG := PROPDMG*mult, by=.EACHI, on="Magnitude"]

on= allows to perform binary search based subsets/joins without having to set keys. by=.EACHI evaluates expression in j for each row in df2.

Upvotes: 3

akrun

Reputation: 887961

We could use base R to do this

transform(df1, Property.Damage = Property.Damage * setNames(c(1e3, 
            1e6, 1e9), c("K", "M", "B"))[Magnitude])

data

df1 <- data.frame(Observation.ID = 1:5, Event.Type = LETTERS[1:5], 
     Property.Damage = c(1, 5, 3, 4, 7), 
     Magnitude = c("K", "M", "K", "B", "M"), stringsAsFactors=FALSE)

Upvotes: 1

thelatemail

Reputation: 93938

You may be better off just making a look up table and merging this back in before doing the multiplication. Something like:

df <- data.frame(propdmg=1:6, magnitude=rep(c("K","M","B"),each=2))
df

#  propdmg magnitude
#1       1         K
#2       2         K
#3       3         M
#4       4         M
#5       5         B
#6       6         B

lkup <- data.frame(magnitude=c("K","M","B"),mult=c(1e3,1e6,1e7))
left_join(df, lkup) %>% mutate(result=propdmg * mult, mult=NULL)

#Joining by: "magnitude"
#  propdmg magnitude result
#1       1         K  1e+03
#2       2         K  2e+03
#3       3         M  3e+06
#4       4         M  4e+06
#5       5         B  5e+07
#6       6         B  6e+07

The direct equivalent in base R would be:

transform(merge(df, lkup), result=mult * propdmg, mult=NULL)

Upvotes: 3

Performing a computation on subset using R and dplyr

Answers (3)

data

Related Questions