Reputation: 613
I have a CSV file like
Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Bro Pest Control,cat|home & garden|pest control,kw|entry,Bro Pest Control,http://www.ci.com/profile/66/ab/brrd_pest_control.html,Pest Control,,NO,NO,"Ablle,Louna,United States",,,,,,,,,
447,Dist Tire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,DisTire Ctr Inc,http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html,Autepair,,NO,NO,"Abblle,Louana,United States",,,,,,,,,
6665,Best Control,geo|la|abbe la area,home & garden|pest control,Br Pest Control,http://www.cit.com/profile/66/abbee_la/broud_pest_control.html,Pest Control,,NO,NO,"A,Louisiana,United States",,,,,,,,,
My desired output is
Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,NO,NO,"Abbe,Louiana,United States"
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,NO,NO,"Abbe,Loua,United States"
The piece of code I am using is
mydf <- read.csv("C:/Users/Administrator/Downloads/FinalLocationList1.csv", header=FALSE, skip=1)
d <- setNames(mydf[,sapply(mydf, function(x) all(!is.na(x)))],names(n))
z <- mydf <- Filter(function(x)!all(is.na(x)), mydf)
Credit - Thomas
But the above is not taking care of the Header problem? How to solve it? New to R. Any help is appreciated.
Edit : Output of dput(mydf)
structure(list(V1 = c(666L, 447L, 6665L), V2 = structure(c(2L,
3L, 1L), .Label = c("Best Control", "Bro Pest Control", "Dist Tire Ctr Inc"
), class = "factor"), V3 = structure(c(2L, 1L, 3L), .Label = c("cat|automotive sales & services|automotive repair",
"cat|home & garden|pest control", "geo|la|abbe la area"), class = "factor"),
V4 = structure(c(2L, 2L, 1L), .Label = c("home & garden|pest control",
"kw|entry"), class = "factor"), V5 = structure(c(2L, 3L,
1L), .Label = c("Br Pest Control", "Bro Pest Control", "DisTire Ctr Inc"
), class = "factor"), V6 = structure(1:3, .Label = c("http://www.ci.com/profile/66/ab /brrd_pest_control.html",
"http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html",
"http://www.cit.com/profile/66/abbee_la/broud_pest_control.html"
), class = "factor"), V7 = structure(c(2L, 1L, 2L), .Label = c("Autepair",
"Pest Control"), class = "factor"), V8 = c(NA, NA, NA), V9 = structure(c(1L,
1L, 1L), .Label = "NO", class = "factor"), V10 = structure(c(1L,
1L, 1L), .Label = "NO", class = "factor"), V11 = structure(c(3L,
2L, 1L), .Label = c("A,Louisiana,United States", "Abblle,Louana,United States",
"Ablle,Louna,United States"), class = "factor"), V12 = c(NA,
NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA, NA), V15 = c(NA,
NA, NA), V16 = c(NA, NA, NA), V17 = c(NA, NA, NA), V18 = c(NA,
NA, NA), V19 = c(NA, NA, NA), V20 = c(NA, NA, NA)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"
), class = "data.frame", row.names = c(NA, -3L))
Error
Error in setNames(mydf[, sapply(mydf, function(x) all(!is.na(x)))], names(n)) :
'names' attribute [20] must be the same length as the vector [10]
Upvotes: 1
Views: 127
Reputation: 263481
Use colClasses="NULL"
for the columns you want to drop ... the last nine in this case, so rep("NULL",9)
tx <- 'Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,,NO,NO,"Abbe,Louiana,United States",,,,,,,,,
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,,NO,NO,"Abbe,Loua,United States",,,,,,,,,'
df <- read.table(text=tx, sep=",",
colClasses=c("numeric", rep("character",10), rep("NULL",9)),
header=TRUE)
> str(df)
'data.frame': 2 obs. of 11 variables:
$ Identity : num 666 44
$ AdvertiserName : chr "Broud Pest Control" "DiscTire Ctr Inc"
$ CampaignName : chr "cat|home & garden|pest control" "cat|automotive sales & services|automotive repair"
$ AdGroupName : chr "kw|entry" "kw|entry"
$ Keyword : chr "Bssad Pest Control" "Discount Tire Ctr Inc"
$ DestURL : chr "http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html" "http://www.cit.com/profile/44/ab/discouctr_inc.html"
$ KeystoneKW : chr "Pest Control" "Automotive Repair"
$ X : chr "" ""
$ CampaignDuplicate: chr "NO" "NO"
$ AdGroupDuplicate : chr "NO" "NO"
$ CampaignLocation : chr "Abbe,Louiana,United States" "Abbe,Loua,United States"
Upvotes: 2
Reputation: 52687
You can try:
setNames(
Filter(function(x) !all(is.na(x)), mydf),
names(mydf)[-grep("^X(\\.[0-9]+)?$", names(mydf))]
)
Produces:
Identity Number Data Result Add
1 1 4 55 92 62
2 3 7 43 12 74
3 7 3 58 52 64
4 0 6 10 22 96
5 3 8 13 92 22
Filter
will keep all columns that are not all NA
. Then the grep piece relies on the names produced by read.CSV for blank columns (X, X.1, etc.) to filter out the wrong names. This should work generically.
EDIT: running with updated CSV produces:
> str(setNames(Filter(function(x) !all(is.na(x)), mydf), names(mydf)[-grep("^X(\\.[0-9]+)?", names(mydf))]))
'data.frame': 2 obs. of 10 variables:
$ Identity : int 666 44
$ AdvertiserName : Factor w/ 2 levels "Broud Pest Control",..: 1 2
$ CampaignName : Factor w/ 2 levels "cat|automotive sales & services|automotive repair",..: 2 1
$ AdGroupName : Factor w/ 1 level "kw|entry": 1 1
$ Keyword : Factor w/ 2 levels "Bssad Pest Control",..: 1 2
$ DestURL : Factor w/ 2 levels "http://www.cit.com/profile/44/ab/discouctr_inc.html",..: 2 1
$ KeystoneKW : Factor w/ 2 levels "Automotive Repair",..: 2 1
$ CampaignDuplicate: Factor w/ 1 level "NO": 1 1
$ AdGroupDuplicate : Factor w/ 1 level "NO": 1 1
$ CampaignLocation : Factor w/ 2 levels "Abbe,Loua,United States",..: 2 1
Upvotes: 0