user3188390
user3188390

Reputation: 613

How in R one can do away with empty columns?

I have a CSV file like

Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Bro Pest Control,cat|home & garden|pest control,kw|entry,Bro Pest Control,http://www.ci.com/profile/66/ab/brrd_pest_control.html,Pest Control,,NO,NO,"Ablle,Louna,United States",,,,,,,,,
447,Dist Tire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,DisTire Ctr Inc,http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html,Autepair,,NO,NO,"Abblle,Louana,United States",,,,,,,,,
6665,Best Control,geo|la|abbe la area,home & garden|pest control,Br Pest Control,http://www.cit.com/profile/66/abbee_la/broud_pest_control.html,Pest Control,,NO,NO,"A,Louisiana,United States",,,,,,,,,

My desired output is

 Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,NO,NO,"Abbe,Louiana,United States"
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,NO,NO,"Abbe,Loua,United States"

The piece of code I am using is

mydf <- read.csv("C:/Users/Administrator/Downloads/FinalLocationList1.csv", header=FALSE, skip=1)
d <- setNames(mydf[,sapply(mydf, function(x) all(!is.na(x)))],names(n))
z <- mydf <- Filter(function(x)!all(is.na(x)), mydf)

Credit - Thomas

But the above is not taking care of the Header problem? How to solve it? New to R. Any help is appreciated.

Edit : Output of dput(mydf)

structure(list(V1 = c(666L, 447L, 6665L), V2 = structure(c(2L, 
3L, 1L), .Label = c("Best Control", "Bro Pest Control", "Dist Tire Ctr Inc"
), class = "factor"), V3 = structure(c(2L, 1L, 3L), .Label = c("cat|automotive sales &   services|automotive repair", 
"cat|home & garden|pest control", "geo|la|abbe la area"), class = "factor"), 
V4 = structure(c(2L, 2L, 1L), .Label = c("home & garden|pest control", 
"kw|entry"), class = "factor"), V5 = structure(c(2L, 3L, 
1L), .Label = c("Br Pest Control", "Bro Pest Control", "DisTire Ctr Inc"
), class = "factor"), V6 = structure(1:3, .Label = c("http://www.ci.com/profile/66/ab /brrd_pest_control.html", 
"http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html", 
"http://www.cit.com/profile/66/abbee_la/broud_pest_control.html"
), class = "factor"), V7 = structure(c(2L, 1L, 2L), .Label = c("Autepair", 
"Pest Control"), class = "factor"), V8 = c(NA, NA, NA), V9 = structure(c(1L, 
1L, 1L), .Label = "NO", class = "factor"), V10 = structure(c(1L, 
1L, 1L), .Label = "NO", class = "factor"), V11 = structure(c(3L, 
2L, 1L), .Label = c("A,Louisiana,United States", "Abblle,Louana,United States", 
"Ablle,Louna,United States"), class = "factor"), V12 = c(NA, 
NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA, NA), V15 = c(NA, 
NA, NA), V16 = c(NA, NA, NA), V17 = c(NA, NA, NA), V18 = c(NA, 
NA, NA), V19 = c(NA, NA, NA), V20 = c(NA, NA, NA)), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"
), class = "data.frame", row.names = c(NA, -3L))

Error

Error in setNames(mydf[, sapply(mydf, function(x) all(!is.na(x)))], names(n)) : 
'names' attribute [20] must be the same length as the vector [10]

Upvotes: 1

Views: 127

Answers (2)

IRTFM
IRTFM

Reputation: 263481

Use colClasses="NULL" for the columns you want to drop ... the last nine in this case, so rep("NULL",9)

tx <- 'Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,,NO,NO,"Abbe,Louiana,United States",,,,,,,,,
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,,NO,NO,"Abbe,Loua,United States",,,,,,,,,'

df <- read.table(text=tx, sep=",", 
                 colClasses=c("numeric", rep("character",10), rep("NULL",9)), 
                 header=TRUE)

> str(df)
'data.frame':   2 obs. of  11 variables:
 $ Identity         : num  666 44
 $ AdvertiserName   : chr  "Broud Pest Control" "DiscTire Ctr Inc"
 $ CampaignName     : chr  "cat|home & garden|pest control" "cat|automotive sales & services|automotive repair"
 $ AdGroupName      : chr  "kw|entry" "kw|entry"
 $ Keyword          : chr  "Bssad Pest Control" "Discount Tire Ctr Inc"
 $ DestURL          : chr  "http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html" "http://www.cit.com/profile/44/ab/discouctr_inc.html"
 $ KeystoneKW       : chr  "Pest Control" "Automotive Repair"
 $ X                : chr  "" ""
 $ CampaignDuplicate: chr  "NO" "NO"
 $ AdGroupDuplicate : chr  "NO" "NO"
 $ CampaignLocation : chr  "Abbe,Louiana,United States" "Abbe,Loua,United States"

Upvotes: 2

BrodieG
BrodieG

Reputation: 52687

You can try:

    setNames(
      Filter(function(x) !all(is.na(x)), mydf), 
      names(mydf)[-grep("^X(\\.[0-9]+)?$", names(mydf))]
    )

Produces:

  Identity Number Data Result Add
1        1      4   55     92  62
2        3      7   43     12  74
3        7      3   58     52  64
4        0      6   10     22  96
5        3      8   13     92  22

Filter will keep all columns that are not all NA. Then the grep piece relies on the names produced by read.CSV for blank columns (X, X.1, etc.) to filter out the wrong names. This should work generically.


EDIT: running with updated CSV produces:

> str(setNames(Filter(function(x) !all(is.na(x)), mydf), names(mydf)[-grep("^X(\\.[0-9]+)?", names(mydf))]))
'data.frame': 2 obs. of  10 variables:
 $ Identity         : int  666 44
 $ AdvertiserName   : Factor w/ 2 levels "Broud Pest Control",..: 1 2
 $ CampaignName     : Factor w/ 2 levels "cat|automotive sales & services|automotive repair",..: 2 1
 $ AdGroupName      : Factor w/ 1 level "kw|entry": 1 1
 $ Keyword          : Factor w/ 2 levels "Bssad Pest Control",..: 1 2
 $ DestURL          : Factor w/ 2 levels "http://www.cit.com/profile/44/ab/discouctr_inc.html",..: 2 1
 $ KeystoneKW       : Factor w/ 2 levels "Automotive Repair",..: 2 1
 $ CampaignDuplicate: Factor w/ 1 level "NO": 1 1
 $ AdGroupDuplicate : Factor w/ 1 level "NO": 1 1
 $ CampaignLocation : Factor w/ 2 levels "Abbe,Loua,United States",..: 2 1

Upvotes: 0

Related Questions