Benjamin
Benjamin

Reputation: 721

Create tables based on unique and nonunique values in a single column

Given a CSV with the following structure,

id, postCode, someThing, someOtherThing
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales

I wish to create two tables, based on whether the value in the postCode column is unique or not. The values of the other columns do not matter to me, but they have to be copied to the new tables.

My end data should look like this, with one table based on unique postCodes:

id, postCode, someThing, someOtherThing
3,E8 KAK, mice, rats
4,VH3 2K2, humans, whales

And another where postCode values are duplicated

id, postCode, someThing, someOtherThing    
1,E3 4AX, cats, dogs
2,E3 4AX, elephants, sheep

So far I can load the data but I'm not sure of the next step:

myData <- read.csv("path/to/my.csv",
  header=TRUE,
  sep=",",
  stringsAsFactors=FALSE
)

New to R so help appreciated.

Data in dput format.

df <-
structure(list(id = 1:4, postCode = structure(c(1L, 1L, 2L, 3L
), .Label = c("E3 4AX", "E8 KAK", "VH3 2K2"), class = "factor"), 
someThing = structure(c(1L, 2L, 4L, 3L), .Label = c(" cats", 
" elephants", " humans", " mice"), class = "factor"), 
someOtherThing = structure(c(1L, 3L, 2L, 4L), 
.Label = c(" dogs", " rats", " sheep", " whales               "
), class = "factor")), class = "data.frame", 
row.names = c(NA, -4L))

Upvotes: 0

Views: 40

Answers (2)

svenhalvorson
svenhalvorson

Reputation: 1080

If df is the name of your data.frame, which can be formed as:

df <- read.table(header = T, text = "
id, postCode, someThing, someOtherThing
1, E3 4AX, cats, dogs
2, E3 4AX, elephants, sheep
3, E8 KAK, mice, rats
4, VH3 2K2, humans, whales
       ")

Then the uniques and duplicates can be found using the funciton n(), which collects the number of observation per grouped variable. Then,

uniques = df %>%
  group_by(postCode) %>%
  filter(n() == 1)

dupes = df %>%
  group_by(postCode) %>%
  filter(n() > 1)

Unclear why someone edited this response. Maybe they hate tribbles

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76641

If you can do with a list of the two data.frames, which seems to be better than to have many related objects in the .GlobalEnv, try split.

f <- rev(cumsum(rev(duplicated(df$postCode))))
split(df, f)
#$`0`
#  id postCode someThing         someOtherThing
#3  3   E8 KAK      mice                   rats
#4  4  VH3 2K2    humans  whales               
#
#$`1`
#  id postCode  someThing someOtherThing
#1  1   E3 4AX       cats           dogs
#2  2   E3 4AX  elephants          sheep

Upvotes: 0

Related Questions