Reputation: 21
I have a positional text file that has the related data split into two lines.
Column 1Column 2Column 3
Text
11 12 13
text for 1
21 22 23
text for 2
31 32 33
text for 3
41 42 43
text for 4
51 52 53
text for 5
I'm trying to get this into a dataframe like
Column 1Column 2Column 3 Text
11 12 13 text for 1
21 22 23 text for 2
31 32 33 text for 3
41 42 43 text for 4
51 52 53 text for 5
I'm testing without the column headers
import pandas as pd
cols=([(0,8),(8,16),(16,None),(0,50)])
rs=pd.read_fwf(fn,colspecs=cols,header=None)
gives me:
0 1 2 3
0 11 12 13.0 11 12 13
1 text for 1 NaN text for 1
2 21 22 23.0 21 22 23
3 text for 2 NaN text for 2
is there any way to alternate the formats of the lines
Upvotes: 2
Views: 635
Reputation: 71580
You can try to get every other row, and join the Text
into one string as a new Text
column, like this:
data = df.values.tolist()[::2][1:]
df = df[1:]
df = pd.DataFrame(df.values.tolist()[::2], columns=df.columns)
df['Text'] = [' '.join([str(x) for x in i[:-1]]) for i in data]
df = df.drop('3', axis=1)
print(df)
Result:
Column 1Column 2Column Text
0 11 12 13.0 text for 1.0
1 21 22 23.0 text for 2.0
2 31 32 33.0 text for 3.0
3 41 42 43.0 text for 4.0
4 51 52 53.0 text for 5.0
The first line of code creates an new list with the values of every other row in df
, the second lines removes the first row from df
, then the third line creates a new dataframe
with every other row, then the fourth line creates the Text
column with a list
comprehension with the data
list, the fifth row drops the 3rd column, because it's extra, the sixth row prints the data out.
Upvotes: 1