Reputation: 12141
I'm not sure if this can be achieved in Google Refine at all. But basically, I have data like this.
The first table is the table of all the users. The second table show all the friends. However, in the second table in "friends"
column not all the id exists in the first table which I want to get rid of. So, how can I search each id in friends
column in the second table and get rid of the id that doesn't exists in the table 1?
Upvotes: 3
Views: 961
Reputation: 10540
Put the two tables in different projects (we'll call them Table1
and Table2
).
In Table2
on on the friends
column:
cross(cell,'Table1','user_id').length()
This will return 0 if there's no match, 1 if there's a match or N>1 if there are duplicates in Table1
If you want the data back in the original format, set up a facet to filter on the validity column, blank out all the bad values and then use "join multi-valued cells" to reverse the split operation you did up front.
I fixed some caching bugs with cross() for OpenRefine 2.6, so if the cross doesn't work, try stopping and restarting the Refine server.
Upvotes: 4