Reputation: 1121
I have a large set of vertices/nodes that represent a set of graphs. Note that there could be many independent graphs within this complete set. The goal is to find the min number of vertices across all these graphs that correspond to the largest total sum of weights across all the edges captured by those selected vertices. I have the adjacency matrix in pandas and I am using networkx.
Below is a sample dataframe with the three columns where Number_Of_Trips is the weight. I could provide a weight of node = 10*trips in order to merge the two metrics together. I.e. maximizing # of Trips - 10*NumberOfNodes
Number_Of_Trips dropoff_gh7 pickup_gh7
0 304 9tbqhsx 9tbqj4g
1 271 9tbqj4f 9tbqhsx
2 263 9tbqt4s 9tbqhsx
3 258 9tbqdye 9tbqdsr
4 256 9tbqhgh 9tbqjfv
5 236 9tbqhsw 9tbqj4g
6 233 9tbqt4g 9tbqv03
7 229 9tbqhsx 9tbqj4c
8 218 9tbqy3f 9tbqt4s
9 213 9tbq5v4 9tbqh41
10 210 9tbqhgh 9tbqhsw
11 192 9tbqhgh 9tbqje4
12 186 9tbqy3f 9tbqt4g
13 184 9tbqhgh 9tbqj4z
14 183 9tbqe3d 9tbqe9e
15 170 9tbq3xn 9tbq39w
16 167 9tbq5bw 9tbqht6
17 163 9tbqhsx 9tbqh0x
18 162 9tbqdk1 9tbq7p2
19 160 9tbqsch 9tbqt4s
x = nx.from_pandas_dataframe(df,"dropoff_gh7","pickup_gh7","Number_Of_Trips")
graphs = list(nx.connected_component_subgraphs(x))
Upvotes: 0
Views: 275
Reputation: 1121
Note that one caveat to the question is that you could have multiple independent subgraphs within the graph that could be the solution. The key intuition to this solution is that the most likely candidates for subgraphs are the vertices that share a lot of edges with each other. It turns out that this is precisely what is evaluated when looking at Cliques in a graph. Hence, this solution simply extracts all the cliques and then rank orders them by the total num of weights represented by the vertices in the clique - the number of vertices * cost of a vertice. This can be quickly prototyped using NetworkX.
G = nx.from_pandas_dataframe(df, "dropoff_gh7", "pickup_gh7", ['num_of_trips'])
# Find all the cliques in the graph (not only maximal but all sub cliques as well. Note that clique finding is NP complete so this may take a long time if your graph is > 100k of edges or more. For <100k edges, this took within 5 mins on a 16GB macbook pro 3GHz machine.
cliques = nx.find_cliques(G)
clique_trips = [np.array([c,G.subgraph(c).size(weight="num_of_trips")]) for c in cliques]
df_cliques = pd.DataFrame(clique_trips,columns=["vertices","num_of_trips"])
df_cliques["num_vertices"] = df_cliques.apply(lambda x:len(x[0]), axis=1)
df_cliques["weighted_trips"] = df_cliques.apply(lambda row:
row["num_of_trips"] - row["num_vertices"]*COST_PER_NODE, axis=1)
df_cliques = df_cliques.sort_values("weighted_trips")[::-1]
df_cliques.head()
# The top N cliques can then be aggregated into a set to identify the precise vertices that are most valuable.
Upvotes: 0
Reputation: 77860
Here's an outline of the logic.
Create a cluster structure. A cluster has member nodes, an internal value (total internal trips), and edges to other clusters.
Start with each node in an individual cluster. Put all of these clusters into a "not done" list. You're now going to iterate through that list, merging clusters where you find advantage in doing so. Pick the first cluster in the list.
Iterate: For every edge of that cluster, check the net value of merging the cluster at the other end of that edge: internal trips + edge trips - 10*cluster population (quantity of nodes).
Merge: Concatenate the member-node lists of the two clusters. Add their internal values and the value of the edge between them. Adjust for the node population (if you're not already doing that accounting elsewhere). Merge the lists of edges to other clusters. Remove the merged cluster from the "not done" list.
Continue with this "Kleene Closure" process until you have no more nodes to add profitably. Move this resulting cluster to the "done" list. Pick the next node in the "not done" list and repeat the iterate & merge loop until the "done" list is empty.
Now, move the entire "done" list back to the "not done" list and repeat the process until you complete a pass with no further merges.
Is that detailed enough for you to code the process?
Upvotes: 1