Sreenath Raju
Sreenath Raju

Reputation: 11

How can I merge a single column from one file to another file

A

phy     44
chem    46  
maths   44  
biol    42
his     38
comm    40


B
Name    RaK
Phy     45
chem    43
maths   41
His     38
Comm    48
geo     49

Expected output

            RaK
phy     44  45
chem    46  43
maths   44  41
biol    42   0
his     38  38
comm    40  48

I need something like join(-a1 a,b). Is it possible to do it with R? The first file doesn't have column headers.

Upvotes: 1

Views: 216

Answers (2)

Uwe
Uwe

Reputation: 42544

The OP has requested to join or merge all rows of file A with file B so that the expected result contains biol (from A) but not geo (from B).

In addition, the Name column of B is partially in title case whereas the corresponding column of A is in lower case.

It is assumed that the data of A are stored in a csv file called "A.csv" where the first row of A.csv is empty (blank). Likewise for B but here the first row constains the column headers.

With data.table this can be done using a "one-liner":

library(data.table)
# read B, convert Name column to lower case
fread("B.csv")[, Name := tolower(Name)][
  # read A and right join = all rows of A, only matching rows of B
  fread("A.csv"), on = c("Name" = "V1")][
    # replace NA by 0 
    is.na(RaK), RaK := 0L][]
    Name RaK V2
1:   phy  45 44
2:  chem  43 46
3: maths  41 44
4:  biol   0 42
5:   his  38 38
6:  comm  48 40

fread() is data.table's own, faster version of read.table().

Data

For the sake of reproducibility, here are A and B after the calls to fread():

A <- structure(list(V1 = c("phy", "chem", "maths", "biol", "his", 
"comm"), V2 = c(44L, 46L, 44L, 42L, 38L, 40L)), .Names = c("V1", 
"V2"), row.names = c(NA, -6L), class = "data.frame")

B <- structure(list(Name = c("phy", "chem", "maths", "his", "comm", 
"geo"), RaK = c(45L, 43L, 41L, 38L, 48L, 49L)), .Names = c("Name", 
"RaK"), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 2

lmo
lmo

Reputation: 38500

Read in the data:

A <- read.table(text="phy     44
chem    46  
maths   44  
biol    42
his     38
comm    40", as.is=TRUE)

B <- read.table(header=TRUE, text="Name    RaK
phy    45
chem    43
maths   41
his     38
comm    48
geo     49", as.is=TRUE)

The as.is=TRUE argument keeps the Name column as character rather than converting to factor. This is preferred when merging two different data.frames. Now, add column names to A.

names(A) <- c("Name", "col1")

Perform the merge, keeping all of the rows in A, regardless of whether B has such a row. Because both data.frames have a name in common, it is not necessary to add a by statement, though this is probably better practice.

dat <- merge(A, B, all.x=TRUE)

This returns

dat
   Name col1 RaK
1  biol   42  NA
2  chem   46  43
3  comm   40  48
4   his   38  38
5 maths   44  41
6   phy   44  45

Notice that the first value of Rak is missing (NA). To replace this with 0, you can use

dat$RaK[is.na(dat$RaK)] <- 0

and it is done.

dat
   Name col1 RaK
1  biol   42   0
2  chem   46  43
3  comm   40  48
4   his   38  38
5 maths   44  41
6   phy   44  45

Upvotes: 2

Related Questions