Xizam
Xizam

Reputation: 718

Joining the data in two columns into one column in R

I have two columns in a dataset after merging two seperate datasets. I would like to merge these columns into one column, BNR.x.

For the cases listed below my prefered outcomes would be:
1. Nothing. BNR.x has data, that's fine.
2. Nothing. Data in both colums is the same, that's fine.
3. Data from BNR.y is copied to BNR.x
4. Nothing. Same as 2.
5. Data in colums is different. Preferably i'd get an extra column with a 1 of FALSE as warning in this row.
6. No Data. Preferably i'd get a warning here aswell to notify me that i dont have any data for this item.

+----+-------+-------+
| ID | BNR.x | BNR.y |
+----+-------+-------+
|  1 | 123   | NA    |
|  2 | 234   | 234   |
|  3 | NA    | 345   |
|  4 | 456   | 456   |
|  5 | 678   | 677   |
|  6 | NA    | NA    |
+----+-------+-------+

Is there a method or package that will do this for me?

Upvotes: 3

Views: 2643

Answers (3)

juba
juba

Reputation: 49033

If your data are in a data frame called d, you can do :

## Copy BNR.y if BNR.x is missing
d$BNR.x[is.na(d$BNR.x)] <- d$BNR.y[is.na(d$BNR.x)]
## List the indices of BNR.x that are still missing
which(is.na(d$BNR.x))
## List the indices where BNR.x is different from BNR.y
which(d$BNR.x != d$BNR.y)

Upvotes: 1

vodka
vodka

Reputation: 508

From:

df
V1  V2  V3
1  1 123  NA
...

df[which(is.na(df$V2)),]$V2 <- df[which(is.na(df$V2)),]$V3
df$warn <- 0
df[which(is.na(df$V2)),]$warn <- 1
df[which(df$V2 != df$V3 & !is.na(df$V3)),]$warn <- 1

Ok, overuse of which and transform is nicer, but I have to start somewhere :)

ps. am I wrong or

d$BNR.x[is.na(d$BNR.x)] <- d$BNR.y

won't work because it will place "wrongly aligned" BNR$y values in correspondence to BNR$x NAs?

Upvotes: 0

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

Here's a proposal. datis the name of the data frame:

idx <- is.na(dat$BNR.x) # create logical index for NAs in BNR.x

dat$BNR.x[idx] <- dat$BNR.y[idx] # replace NAs with values from BNR.y

# Add a logical column:
dat <- transform(dat, warn = is.na(BNR.x) | (BNR.x != BNR.y & !is.na(BNR.y)))

The result:

  ID BNR.x BNR.y  warn
1  1   123    NA FALSE
2  2   234   234 FALSE
3  3   345   345 FALSE
4  4   456   456 FALSE
5  5   678   677  TRUE
6  6    NA    NA  TRUE

Upvotes: 2

Related Questions