Reputation: 246
I have a set of data (10 columns, 1000 rows) that is indexed by an ID number that one or more of these rows can share. To give a small example to illustrate my point, consider this table:
ID Name Location
5014 John
5014 Kate California
5014 Jim
5014 Ryan California
5018 Pete
5018 Pat Indiana
5019 Jeff Arizona
5020 Chris Kentucky
5020 Mike
5021 Will Indiana
I need for all entries to have something in the Location field and I'm having a hell of a time trying to do it.
Things to note:
Any ideas for a solution? I'm currently using R with the data.table
package, but I'm relatively new to it.
Upvotes: 3
Views: 103
Reputation: 887163
We can convert the 'data.frame' to 'data.table' (setDT(df1)
), Grouped by 'ID', get the elements of Location
that are not ''
(Location[Location!=''][1L]
). Suppose, if there are more than one element per group that are not ''
, the [1L]
, selects the first non-blank element, and assign (:=
) the output to Location
library(data.table)
setDT(df1)[, Location := Location[Location != ''][1L], by = ID][]
# ID Name Location
# 1: 5014 John California
# 2: 5014 Kate California
# 3: 5014 Jim California
# 4: 5014 Ryan California
# 5: 5018 Pete Indiana
# 6: 5018 Pat Indiana
# 7: 5019 Jeff Arizona
# 8: 5020 Chris Kentucky
# 9: 5020 Mike Kentucky
#10: 5021 Will Indiana
Or we can use setdiff
as suggested by @Frank
setDT(df1)[, Location:= setdiff(Location,'')[1L], by = ID][]
Upvotes: 4