Reputation: 721
Given a CSV with the following structure,
id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales
I wish to create two tables, based on whether the value in the postCode
column is unique or not. The values of the other columns do not matter to me, but they have to be copied to the new tables.
My end data should look like this, with one table based on unique postCode
s:
id, postCode, someThing, someOtherThing
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales
And another where postCode
values are duplicated
id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
So far I can load the data but I'm not sure of the next step:
myData <- read.csv("path/to/my.csv",
header=TRUE,
sep=",",
stringsAsFactors=FALSE
)
New to R so help appreciated.
Data in dput
format.
df <-
structure(list(id = 1:4, postCode = structure(c(1L, 1L, 2L, 3L
), .Label = c("E3 4AX", "E8 KAK", "VH3 2K2"), class = "factor"),
someThing = structure(c(1L, 2L, 4L, 3L), .Label = c(" cats",
" elephants", " humans", " mice"), class = "factor"),
someOtherThing = structure(c(1L, 3L, 2L, 4L),
.Label = c(" dogs", " rats", " sheep", " whales "
), class = "factor")), class = "data.frame",
row.names = c(NA, -4L))
Upvotes: 0
Views: 40
Reputation: 1080
If df is the name of your data.frame, which can be formed as:
df <- read.table(header = T, text = "
id, postCode, someThing, someOtherThing
1, E3 4AX, cats, dogs
2, E3 4AX, elephants, sheep
3, E8 KAK, mice, rats
4, VH3 2K2, humans, whales
")
Then the uniques and duplicates can be found using the funciton n()
, which collects the number of observation per grouped variable
. Then,
uniques = df %>%
group_by(postCode) %>%
filter(n() == 1)
dupes = df %>%
group_by(postCode) %>%
filter(n() > 1)
Unclear why someone edited this response. Maybe they hate tribbles
Upvotes: 3
Reputation: 76641
If you can do with a list of the two data.frames, which seems to be better than to have many related objects in the .GlobalEnv
, try split
.
f <- rev(cumsum(rev(duplicated(df$postCode))))
split(df, f)
#$`0`
# id postCode someThing someOtherThing
#3 3 E8 KAK mice rats
#4 4 VH3 2K2 humans whales
#
#$`1`
# id postCode someThing someOtherThing
#1 1 E3 4AX cats dogs
#2 2 E3 4AX elephants sheep
Upvotes: 0