Wednesday, 14 August 2013

Recomended way to create a matrix containing strings in python

Recomended way to create a matrix containing strings in python

I need to write a programm that collects different datasets and unites
them. For this I have to read in a comma seperated matrix: In this case
each row represents an instance (in this case proteins), each column
represents an attribute of the instances. If an instance has an attribute,
it is represented by a 1, otherwise 0. The matrix looks like the example
given below, but much larger, with 35000 instances and hundreds of
attributes.
Proteins,Attribute 1,Attribute 2,Attribute 3,Attribute 4
Protein 1,1,1,1,0
Protein 2,0,1,0,1
Protein 3,1,0,0,0
Protein 4,1,1,1,0
Protein 5,0,0,0,0
Protein 6,1,1,1,1
I need a way to store the matrix before writing into a new file with other
information about the instances. I thought of using numpy arrays, since i
would like to be able to select and check single columns. I tried to use
numpy.empty to create the array of the given size, but it seems that you
have to preselect the lengh of the strings and cannot change them
afterwards.
Is there a better way to deal with such data? I also thought of
dictionarys of lists but then iI cannot select single columns.

No comments:

Post a Comment