Reputation: 61
I have a df of 4 columns - c("Observation.ID", "Event.Type", "Property.Damage", "Magnitude").
Magnitude values signify whether property damage is given in thousands, millions, or billions of dollars ("K","M","B").
I want to normalize Property.Damage, so I need to separately compute for the 3 groups:
update df set Property.Damage=(Property.Damage*n) where Magnitude='K'
In dplyr, I understand how to split on class, add the recomputed property damage, combine, then summarize. Surely it's possible to do this more simply, a la SQL?
Edit: I went with data.table because it feels quick/easy compared to base. E.g.:
setkey(df1, Magnitude)
df1["K", PROPDMG := PROPDMG*1e3]
df1["M", PROPDMG := PROPDMG*1e6]
df1["B", PROPDMG := PROPDMG*1e7]
Upvotes: 0
Views: 90
Reputation: 61
I found that data.table was the most appealing approach. In fact, this has switched me from dplyr to data.table for split/apply/combine. Although it appears base R
makes for the fewest keystrokes - I find data.table's i,j, := is
less wonky parenthetically.
setkey(df1, Magnitude)
df1["K", PROPDMG := PROPDMG*1e3]
df1["M", PROPDMG := PROPDMG*1e6]
df1["B", PROPDMG := PROPDMG*1e7]
Alternatively, we can create another data.table as follows:
df2 = data.table(Magnitude = c("K", "M", "B"), mult = c(1e3, 1e6, 1e9))
and then perform an update while joining as follows:
df1[df2, PROPDMG := PROPDMG*mult, by=.EACHI, on="Magnitude"]
on=
allows to perform binary search based subsets/joins without having to set keys. by=.EACHI
evaluates expression in j
for each row in df2
.
Upvotes: 3
Reputation: 887961
We could use base R
to do this
transform(df1, Property.Damage = Property.Damage * setNames(c(1e3,
1e6, 1e9), c("K", "M", "B"))[Magnitude])
df1 <- data.frame(Observation.ID = 1:5, Event.Type = LETTERS[1:5],
Property.Damage = c(1, 5, 3, 4, 7),
Magnitude = c("K", "M", "K", "B", "M"), stringsAsFactors=FALSE)
Upvotes: 1
Reputation: 93938
You may be better off just making a look up table and merging this back in before doing the multiplication. Something like:
df <- data.frame(propdmg=1:6, magnitude=rep(c("K","M","B"),each=2))
df
# propdmg magnitude
#1 1 K
#2 2 K
#3 3 M
#4 4 M
#5 5 B
#6 6 B
lkup <- data.frame(magnitude=c("K","M","B"),mult=c(1e3,1e6,1e7))
left_join(df, lkup) %>% mutate(result=propdmg * mult, mult=NULL)
#Joining by: "magnitude"
# propdmg magnitude result
#1 1 K 1e+03
#2 2 K 2e+03
#3 3 M 3e+06
#4 4 M 4e+06
#5 5 B 5e+07
#6 6 B 6e+07
The direct equivalent in base R would be:
transform(merge(df, lkup), result=mult * propdmg, mult=NULL)
Upvotes: 3