Jeff Yontz
Jeff Yontz

Reputation: 246

Fill in Blank Fields With a Value From Same Key Index

I have a set of data (10 columns, 1000 rows) that is indexed by an ID number that one or more of these rows can share. To give a small example to illustrate my point, consider this table:

ID       Name      Location
5014     John     
5014     Kate     California
5014     Jim
5014     Ryan     California
5018     Pete     
5018     Pat      Indiana
5019     Jeff     Arizona
5020     Chris    Kentucky
5020     Mike
5021     Will     Indiana

I need for all entries to have something in the Location field and I'm having a hell of a time trying to do it.

Things to note:

  1. Every unique ID number has at least one row with the location field populated.
  2. If two rows have the same ID number, they have the same location.
  3. Two different ID numbers can have the same location.
  4. ID numbers are not necessarily consecutive, nor are they necessarily completely numeric. The arrangement of them isn't of importance to me, since any rows that are related share the same ID number.

Any ideas for a solution? I'm currently using R with the data.table package, but I'm relatively new to it.

Upvotes: 3

Views: 103

Answers (1)

akrun
akrun

Reputation: 887163

We can convert the 'data.frame' to 'data.table' (setDT(df1)), Grouped by 'ID', get the elements of Location that are not '' (Location[Location!=''][1L]). Suppose, if there are more than one element per group that are not '', the [1L], selects the first non-blank element, and assign (:=) the output to Location

library(data.table)
setDT(df1)[, Location := Location[Location != ''][1L], by = ID][]
#     ID  Name   Location
# 1: 5014  John California
# 2: 5014  Kate California
# 3: 5014   Jim California
# 4: 5014  Ryan California
# 5: 5018  Pete    Indiana
# 6: 5018   Pat    Indiana
# 7: 5019  Jeff    Arizona
# 8: 5020 Chris   Kentucky
# 9: 5020  Mike   Kentucky
#10: 5021  Will    Indiana

Or we can use setdiff as suggested by @Frank

 setDT(df1)[, Location:= setdiff(Location,'')[1L], by = ID][]

Upvotes: 4

Related Questions