Reputation: 11
A
phy 44
chem 46
maths 44
biol 42
his 38
comm 40
B
Name RaK
Phy 45
chem 43
maths 41
His 38
Comm 48
geo 49
Expected output
RaK
phy 44 45
chem 46 43
maths 44 41
biol 42 0
his 38 38
comm 40 48
I need something like join(-a1 a,b)
. Is it possible to do it with R?
The first file doesn't have column headers.
Upvotes: 1
Views: 216
Reputation: 42544
The OP has requested to join or merge all rows of file A
with file B
so that the expected result contains biol
(from A
) but not geo
(from B
).
In addition, the Name
column of B
is partially in title case whereas the corresponding column of A
is in lower case.
It is assumed that the data of A
are stored in a csv file called "A.csv"
where the first row of A.csv
is empty (blank). Likewise for B
but here the first row constains the column headers.
With data.table
this can be done using a "one-liner":
library(data.table)
# read B, convert Name column to lower case
fread("B.csv")[, Name := tolower(Name)][
# read A and right join = all rows of A, only matching rows of B
fread("A.csv"), on = c("Name" = "V1")][
# replace NA by 0
is.na(RaK), RaK := 0L][]
Name RaK V2 1: phy 45 44 2: chem 43 46 3: maths 41 44 4: biol 0 42 5: his 38 38 6: comm 48 40
fread()
is data.table
's own, faster version of read.table()
.
For the sake of reproducibility, here are A
and B
after the calls to fread()
:
A <- structure(list(V1 = c("phy", "chem", "maths", "biol", "his",
"comm"), V2 = c(44L, 46L, 44L, 42L, 38L, 40L)), .Names = c("V1",
"V2"), row.names = c(NA, -6L), class = "data.frame")
B <- structure(list(Name = c("phy", "chem", "maths", "his", "comm",
"geo"), RaK = c(45L, 43L, 41L, 38L, 48L, 49L)), .Names = c("Name",
"RaK"), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 2
Reputation: 38500
Read in the data:
A <- read.table(text="phy 44
chem 46
maths 44
biol 42
his 38
comm 40", as.is=TRUE)
B <- read.table(header=TRUE, text="Name RaK
phy 45
chem 43
maths 41
his 38
comm 48
geo 49", as.is=TRUE)
The as.is=TRUE argument keeps the Name column as character rather than converting to factor. This is preferred when merging two different data.frames. Now, add column names to A.
names(A) <- c("Name", "col1")
Perform the merge, keeping all of the rows in A, regardless of whether B has such a row. Because both data.frames have a name in common, it is not necessary to add a by statement, though this is probably better practice.
dat <- merge(A, B, all.x=TRUE)
This returns
dat
Name col1 RaK
1 biol 42 NA
2 chem 46 43
3 comm 40 48
4 his 38 38
5 maths 44 41
6 phy 44 45
Notice that the first value of Rak is missing (NA). To replace this with 0, you can use
dat$RaK[is.na(dat$RaK)] <- 0
and it is done.
dat
Name col1 RaK
1 biol 42 0
2 chem 46 43
3 comm 40 48
4 his 38 38
5 maths 44 41
6 phy 44 45
Upvotes: 2