Alex
Alex

Reputation: 55

Pandas read data by rows

Normally the data is presented with columns being the variables, but if for example I had in a .txt file something like

Data1,1,2,3,4,5
Data2,3,1,4

can I use pandas in such a way that it constructs the data frame as what I would obtain from

d = {"Data1":[1,2,3,4,5], "Data2": [3,1,4,1,2]}
df = pd.DataFrame(data=d)

Edit : Originally I wanted tha data sets to be different in size but pandas won't allow it throwing a ValueError

Upvotes: 3

Views: 3173

Answers (4)

timgeb
timgeb

Reputation: 78760

Read the file in, transpose the dataframe, munge a little... et voilà.

Setup for faking your file:

>>> from io import StringIO
>>> file = StringIO('''Data1,1,2,3,4,5
...:Data2,3,1,4''')

Create the dataframe:

>>> df = pd.read_csv(file, header=None).T
>>> df = df.rename(columns=df.loc[0]).drop(0, axis=0)
>>> 
>>> df
>>> 
  Data1 Data2
1     1     3
2     2     1
3     3     4
4     4   NaN
5     5   NaN

However, this is done a little more elegantly in this answer by Peter Leimbigler!

As ALollz has pointed out, these solutions only work when there is no row in your file with more fields than the first row.

Here's my attempt at a more generally applicable solution:

>>> file = StringIO('''Data1,3,1,4
...:Data2,1,2,3,4,5
...:Data3,7,8''')
>>> df = pd.concat([pd.Series(line.split(',')) for line in file], axis=1)
>>> df.rename(columns=df.loc[0]).drop(0, axis=0).astype(float)
>>> 
   Data1  Data2  Data3
1    3.0    1.0    7.0
2    1.0    2.0    8.0
3    4.0    3.0    NaN
4    NaN    4.0    NaN
5    NaN    5.0    NaN

Upvotes: 2

ALollz
ALollz

Reputation: 59579

In the case that your first row isn't the longest you can read in the entire line and then split it with pandas.

df = pd.read_csv('test.csv', sep='\n', header=None)
df[0].str.split(',', expand=True).set_index(0).rename_axis(None,0).T

  Data1 Data2 Data3
1     1     3     1
2     2     1     2
3     3     4     3
4     4  None     4
5     5  None     5
6  None  None     6
7  None  None     7
8  None  None     8
9  None  None     9

Sample Data: test.csv:

Data1,1,2,3,4,5
Data2,3,1,4
Data3,1,2,3,4,5,6,7,8,9

Upvotes: 2

Acccumulation
Acccumulation

Reputation: 3591

You can do data_dict = {line.split(',')[0]:{line.split(',')[1:] for line in lines} where lines is created by reading in the data. Then apply whatever rule you want to get [3,1,4,1,2] instead of [3,1,4] (you don't say what rule you want). Then do pd.DataFrame(data_dict).

Upvotes: 0

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

df = pd.read_csv('example_data.txt', header=None, index_col=0).T
df
0  Data1  Data2
1    1.0    3.0
2    2.0    1.0
3    3.0    4.0
4    4.0    NaN
5    5.0    NaN

Upvotes: 2

Related Questions