Reputation: 55
Normally the data is presented with columns being the variables, but if for example I had in a .txt file something like
Data1,1,2,3,4,5
Data2,3,1,4
can I use pandas in such a way that it constructs the data frame as what I would obtain from
d = {"Data1":[1,2,3,4,5], "Data2": [3,1,4,1,2]}
df = pd.DataFrame(data=d)
Edit :
Originally I wanted tha data sets to be different in size but pandas won't allow it throwing a ValueError
Upvotes: 3
Views: 3173
Reputation: 78760
Read the file in, transpose the dataframe, munge a little... et voilà.
Setup for faking your file:
>>> from io import StringIO
>>> file = StringIO('''Data1,1,2,3,4,5
...:Data2,3,1,4''')
Create the dataframe:
>>> df = pd.read_csv(file, header=None).T
>>> df = df.rename(columns=df.loc[0]).drop(0, axis=0)
>>>
>>> df
>>>
Data1 Data2
1 1 3
2 2 1
3 3 4
4 4 NaN
5 5 NaN
However, this is done a little more elegantly in this answer by Peter Leimbigler!
As ALollz has pointed out, these solutions only work when there is no row in your file with more fields than the first row.
Here's my attempt at a more generally applicable solution:
>>> file = StringIO('''Data1,3,1,4
...:Data2,1,2,3,4,5
...:Data3,7,8''')
>>> df = pd.concat([pd.Series(line.split(',')) for line in file], axis=1)
>>> df.rename(columns=df.loc[0]).drop(0, axis=0).astype(float)
>>>
Data1 Data2 Data3
1 3.0 1.0 7.0
2 1.0 2.0 8.0
3 4.0 3.0 NaN
4 NaN 4.0 NaN
5 NaN 5.0 NaN
Upvotes: 2
Reputation: 59579
In the case that your first row isn't the longest you can read in the entire line and then split it with pandas.
df = pd.read_csv('test.csv', sep='\n', header=None)
df[0].str.split(',', expand=True).set_index(0).rename_axis(None,0).T
Data1 Data2 Data3
1 1 3 1
2 2 1 2
3 3 4 3
4 4 None 4
5 5 None 5
6 None None 6
7 None None 7
8 None None 8
9 None None 9
test.csv
:Data1,1,2,3,4,5
Data2,3,1,4
Data3,1,2,3,4,5,6,7,8,9
Upvotes: 2
Reputation: 3591
You can do data_dict = {line.split(',')[0]:{line.split(',')[1:] for line in lines}
where lines
is created by reading in the data. Then apply whatever rule you want to get [3,1,4,1,2]
instead of [3,1,4]
(you don't say what rule you want). Then do pd.DataFrame(data_dict)
.
Upvotes: 0
Reputation: 11105
df = pd.read_csv('example_data.txt', header=None, index_col=0).T
df
0 Data1 Data2
1 1.0 3.0
2 2.0 1.0
3 3.0 4.0
4 4.0 NaN
5 5.0 NaN
Upvotes: 2