isebarn
isebarn

Reputation: 3950

Julia - DataFrame advanced merging

Im assembling data from multiple sources... specifically, reactions and reaction formulas

Some sources have both the reaction name and the formula, while other sources have may only have the formula, as an example, see rows 2 and 3 in the example

If I have a DataFrame w the following:

│ Row │ reaction │ formula │
├─────┼──────────┼─────────┤
│ 1   │   "a"    │    1    │
│ 2   │   "b"    │    2    │
│ 3   │   ""     │    2    │
│ 4   │   "c"    │    3    │

As the table suggest, rows 2 and 3 have the same reaction formula, but only row 2 has the reaction name. What I'd like to do, is remove those rows that have a formula, that dont have a name, but already exist someplace else with the same formula but also having the reaction name

i.e remove rows those rows which are duplicates w.r.t column 2 (formula) if, leaving the duplicate row that has the reaction name, that is, reaction name not being empty so as to get

│ Row │ reaction │ formula │
├─────┼──────────┼─────────┤
│ 1   │   "a"    │    1    │
│ 2   │   "b"    │    2    │
│ 3   │   "c"    │    3    │

Upvotes: 1

Views: 125

Answers (1)

merch
merch

Reputation: 945

Let's say you have:

df = DataFrame(reaction = ["a", "b", "", "c"], formula = [1, 2, 2, 3]);

What you can do is the following:

 # This index allows you to determine whether or not a reaction is missing:

 ind = df[:reaction].!="";

 # Then, you filter your DataFrame to remove those entries:

 df2=df[ind,:];

Edit: You can increase the complexity of the selector, better defining ind, according to your needs.

Upvotes: 1

Related Questions