Reputation: 2459
I have two CSV files: "userfeatures" and "itemfeatures". Each line in the userfeature is related to specific user. e.g., the first line in the userfeature file is:
005c2e08","Action","nm0000148","dir_ nm0764316","USA"
I need to find the intersection of this line with every line of the 2nd file "itemfeatures". (Actually , I need to repeat this procedure for all the users, i.e, for all lines of "userfeatures").
So, the first comparison will be with the first line of "itemfeatures" that is:
"tt0306047","Comedy,Action","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA"
The result of intersection should be ["Action", "USA]"
but unfortunately, my code only finds ["USA"] as a match. Here is what I've tried so far:
public class Main {
public static void main(String[] args) throws Exception {
BufferedReader userfeatures = new BufferedReader(new FileReader("userFeatureVectorsTest.csv"));
BufferedReader itemfeatures = new BufferedReader(new FileReader("ItemFeatureVectorsTest.csv"));
ArrayList<String> userlines = new ArrayList<>();
ArrayList<String> itemlines = new ArrayList<>();
String Uline = null;
while ((Uline = userfeatures.readLine()) != null) {
for (String Iline = itemfeatures.readLine(); Iline != null; Iline = itemfeatures.readLine()) {
System.out.println(Uline);
System.out.println(Iline);
System.out.println(intersect(Uline, Iline));
System.out.println(union(Uline, Iline));
}
}
userfeatures.close();
itemfeatures.close();
}
static Set<String> intersect(String Uline, String Iline) {
Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
result.retainAll(IlineSet);
return result;
}
static Set<String> union(String Uline, String Iline) {
Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
result.addAll(IlineSet);
return result;
}
}
I think the problem is related to Uline.split(",")
and Iline.split(",")
because they consider "Comedy,Action"
as 1 word and so it cannot find [Action]
as intersection of "Comedy,Action"
and "Action"
.
I appreciate it if someone has any idea how to fix this issue.
Upvotes: 0
Views: 104
Reputation: 6456
If you print your line, what does it look like? I think your issue is in reading the file, for example:
"005c2e08","Action","nm0000148","dir_ nm0764316","USA"
split by ',' will result in:
"005c2e08" "Action"
and so on. While for your second line it will be:
"tt0306047" "Comedy Action"
This is why USA is intercepting, but action is not.
Use A csv reader to read in the csv file, then split the attributes of the CSV line by comma. That way you get rid of the quoutes and your code will work
for example, this library is very handy for reading CSV files:
http://opencsv.sourceforge.net/
Upvotes: 1
Reputation: 17534
Try removing the double quotes in both strings .
Because when you split
"tt0306047","Comedy,Action","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA"
You will get an
Action"
token , which will never match the
"Action"
token.
Upvotes: 2