Reputation: 837
I have huge list of data frames and I want to set column names as per the the condition of the first column and fourth row. Please find my sample data below:
sample_data<-data.frame(a= c("\"\"","This","Week","1","2", "3", "4", "5", "6" ),b=c("\"\"" , "Last" , "Week" , "(1)" , "(2)" , "(3)" , "(4)" , "(5)" , "(6)" ) ,c=c("\"\"" , "End","2008" , "<1>", "<2>", "<3>", "<4>", "<5>", "<6>" ) ,
d=c("" , "Name" , "" , "Tiger Woods" , "Sergio Garcia" ,"Phil Mickelson" ,"Padraig Harrington", "Vijay Singh" ,"Robert Karlsson") ,e=c("", "Country" , "", "United States" ,"Spain", "United States" , "Ireland" , "Fiji" , "Sweden" ),
f=c("" , "Average" , "Points" ,"11.664" , "7.992" , "6.871" , "6.832" , "6.631" , "5.017" ),f=c("", "Total" ,"Points","466.573" ,"415.591" ,"336.674", "348.431", "358.071", "265.906" ))
another_data<-data.frame(a=c("\"\"","Last", "Week" ,"\"\"","(50)", "(55)","(51)","(52)"),b=c("\"\"" ,"End", "2011","\"\"", "<148>","<70>","<49>","<51>"),c=c("","Name", "\"\"" ,"" ,"Kyle Stanley" , "Kevin Na" ,"Gonzalo Fdez-Castano" ,"Ryo Ishikawa") ,
d=c("","State","\"\"","\"\"", "United States","United States" ,"Spain" ,"Japan"),e=c("\"\"" ,"Country" ,"\"\"","\"\"","United States", "United States" ,"Spain","Japan"),f=c("\"\"","Average" ,"Points" ,"","2.694" ,"2.560","2.544" ,"2.539"))
list_df<-list(sample_data,another_data)
list_df
[[1]]
a b c d e f f.1
1 "" "" ""
2 This Last End Name Country Average Total
3 Week Week 2008 Points Points
4 1 (1) <1> Tiger Woods United States 11.664 466.573
5 2 (2) <2> Sergio Garcia Spain 7.992 415.591
6 3 (3) <3> Phil Mickelson United States 6.871 336.674
7 4 (4) <4> Padraig Harrington Ireland 6.832 348.431
8 5 (5) <5> Vijay Singh Fiji 6.631 358.071
9 6 (6) <6> Robert Karlsson Sweden 5.017 265.906
[[2]]
a b c d e f
1 "" "" "" ""
2 Last End Name State Country Average
3 Week 2011 "" "" "" Points
4 "" "" "" ""
5 (50) <148> Kyle Stanley United States United States 2.694
6 (55) <70> Kevin Na United States United States 2.560
7 (51) <49> Gonzalo Fdez-Castano Spain Spain 2.544
8 (52) <51> Ryo Ishikawa Japan Japan 2.539
So what i want to do is to check if the fourth row contains ""
. If so, merge everything between the fourth row to first row of the data frame. On the other hand, if the fourth row does not contain ""
, then merge everything between third row to first row of the data frame.
Here is my trial:
for (index in 1:length(list_df))
{
if(list_df[[index]][4,1]=="")
{
list_df[[index]]<-setNames(as.data.frame(list_df[[index]][-(1:4), ]),
sapply(as.data.frame(list_df[[index]][1:4, ]), paste, collapse = ""))
}
else if(list_df[[index]][4,1]!="")
{
list_df[[index]]<-setNames(as.data.frame(list_df[[index]][-(1:3), ]),
sapply(as.data.frame(list_df[[index]][1:3, ]), paste, collapse = ""))
}
{
return(list_df)
}
}
But this is not giving me the desired output. Working for the first condition only but not considering the second condition.
Here is the output I am getting:
list_df
[[1]]
ThisWeek LastWeek End2008 Name Country AveragePoints TotalPoints
4 1 (1) <1> Tiger Woods United States 11.664 466.573
5 2 (2) <2> Sergio Garcia Spain 7.992 415.591
6 3 (3) <3> Phil Mickelson United States 6.871 336.674
7 4 (4) <4> Padraig Harrington Ireland 6.832 348.431
8 5 (5) <5> Vijay Singh Fiji 6.631 358.071
9 6 (6) <6> Robert Karlsson Sweden 5.017 265.906
[[2]]
a b c d e f
1
2 Last End Name Country Country Average
3 Week 2011 Points
4
5 (50) <148> Kyle Stanley United States United States 2.694
6 (55) <70> Kevin Na United States United States 2.560
7 (51) <49> Gonzalo Fdez-Castano Spain Spain 2.544
8 (52) <51> Ryo Ishikawa Japan Japan 2.539
However, the desired output is as follows:
list_df
[[1]]
ThisWeek LastWeek End2008 Name Country AveragePoints TotalPoints
4 1 (1) <1> Tiger Woods United States 11.664 466.573
5 2 (2) <2> Sergio Garcia Spain 7.992 415.591
6 3 (3) <3> Phil Mickelson United States 6.871 336.674
7 4 (4) <4> Padraig Harrington Ireland 6.832 348.431
8 5 (5) <5> Vijay Singh Fiji 6.631 358.071
9 6 (6) <6> Robert Karlsson Sweden 5.017 265.906
[[2]]
LastWeek End2011 Name State Country AveragePoints
5 (50) <148> Kyle Stanley United States United States 2.694
6 (55) <70> Kevin Na United States United States 2.560
7 (51) <49> Gonzalo Fdez-Castano Spain Spain 2.544
8 (52) <51> Ryo Ishikawa Japan Japan 2.539
I am so perplexed by this. I don't know why one of the conditions are being neglected. Can anyone help me with this please?
Upvotes: 2
Views: 57
Reputation: 388982
Usually it is easy to iterate over the list with lapply
. You can try :
lapply(list_df, function(x) {
if (x[4, 1] == '""') {
names(x) = gsub('"', '', trimws(sapply(x[1:4, ], paste, collapse = "")))
x[-c(1:4), ]
}
else {
names(x) = gsub('"', '', trimws(sapply(x[1:3, ], paste, collapse = "")))
x[-c(1:3), ]
}
})
#[[1]]
# ThisWeek LastWeek End2008 Name Country AveragePoints TotalPoints
#4 1 (1) <1> Tiger Woods United States 11.664 466.573
#5 2 (2) <2> Sergio Garcia Spain 7.992 415.591
#6 3 (3) <3> Phil Mickelson United States 6.871 336.674
#7 4 (4) <4> Padraig Harrington Ireland 6.832 348.431
#8 5 (5) <5> Vijay Singh Fiji 6.631 358.071
#9 6 (6) <6> Robert Karlsson Sweden 5.017 265.906
#[[2]]
# LastWeek End2011 Name State Country AveragePoints
#5 (50) <148> Kyle Stanley United States United States 2.694
#6 (55) <70> Kevin Na United States United States 2.560
#7 (51) <49> Gonzalo Fdez-Castano Spain Spain 2.544
#8 (52) <51> Ryo Ishikawa Japan Japan 2.539
As far as your for
loop is concerned the main reason why it is not working is because you are looking for wrong character. The value at [4, 1]
is ""
and not empty string (Check another_data[4, 1] == ''
vs
another_data[4, 1] == '""'
). If you change that you should get the desired output. However, I did some additional changes in your for
loop.
else if
condition as it is not needed.gsub
as in my answer to remove quotes from column namesreturn(list_df)
doesn't make sense since we are already changing the list_df
in for
loop so removed it.for (index in 1:length(list_df)) {
if(list_df[[index]][4,1] == '""') {
list_df[[index]]<-setNames(as.data.frame(list_df[[index]][-(1:4), ]),
gsub('"', '',sapply(as.data.frame(list_df[[index]][1:4, ]),
paste, collapse = "")))
}
else{
list_df[[index]]<-setNames(as.data.frame(list_df[[index]][-(1:3), ]),
gsub('"', '', sapply(as.data.frame(list_df[[index]][1:3, ]),
paste, collapse = "")))
}
}
Upvotes: 2