A A
A A

Reputation: 1

I want to create adjacency matrix using python

I have very large data set it is almost 450000 lines and two rows, i want to compute adjacency matrix using python, because previously i have tried to do it in matlab, and it shows memory error because of large data values. my data values also start from 100 and goes upto 450000,

Anyone can help me in this issue, as i am new to python.

I have to first import the file into python using excel sheet or notepad and then compute the adjacency matrix

Upvotes: 0

Views: 4930

Answers (3)

notablytipsy
notablytipsy

Reputation: 387

I would use defaultdict - it is simple to use, and just a few lines of code. I'm assuming your file looks like

a b
c d

First, put it into a list (http://docs.python.org/2/library/fileinput.html) so that the format is [(a, b),(c,d)].

Then, use defaultdict:

from collections import defaultdict

adjmat = defaultdict(int)
for edge in list:
    adjmat[edge] = 1

adjmat[a, b] will return 1 if the edge exists, 0 otherwise. If you can have multiple edges between nodes, you only need to change that to adjmat[edge] += 1, and adjmat[a, b] will return the number of edges connects a and b

Upvotes: 0

Judge Maygarden
Judge Maygarden

Reputation: 27573

If I understand your question correctly, then you require more memory than is available in RAM. Even with virtual memory, you probably can't allocate a block that big. Therefore, your solution is to write the adjacency matrix to a file as you build it. This method would work in MatLab or Python.


I am assuming you are processing CAIDA's Router-Level Topology Measurements since the format seems to match your description. Each line of this file contains an edge of the graph from one IP router (column 1) to another (column 2). A full adjacency matrix of the 192244 nodes would require 4.3 GB assuming you only use a single bit for each node. I would still suggest writing the matrix directly to a file instead of building it in memory.

Upvotes: 1

tixxit
tixxit

Reputation: 4122

The simplest way? Well, if you have over 10,000 nodes, but only 45000 edges, use SciPy's sparse matrix:

http://www.scipy.org/SciPy_Tutorial#head-c60163f2fd2bab79edd94be43682414f18b90df7

SciPy provides various compression methods to keep the actual in-memory size of the matrix down (since the matrix values will largely be 0's). I'm sure MatLab provides a space-conscious sparse matrix data structure as well.

If you want to just know how to read in the file, I'd suggest you save it as a CSV or a text file (there is no real benefit in storing the data in an Excel file). Python comes w/ a library for reading/writing CSV files:

http://docs.python.org/library/csv.html

If you really want to use XLS files, then you can use either pyExcelerator (I've never used this) - http://sourceforge.net/projects/pyexcelerator/ - or you can use OpenOffice.org + PyUNO or MS Office + COM.

Upvotes: 0

Related Questions