Reputation: 6270
I try to load a public txt file into a dataframe in Pandas to execute a Name Entity recognition in the german language later. the original txt file has the structure # words [date ] followed by a number (Position in the sentence), a word and the Name entitiy recognition and the words are seperated with Tabs. So the structure is:
text [21-03-1991] 1 Aufgrund O O 2 des O O # text [22-04-1993] 1 Aber O P
has anyone an idea how can i get it into this format:
Aufgrund 0 0
des 0 0
Aber O P
best case every # in a new column?
i would like to use
pd.read_csv(...)
Upvotes: 0
Views: 144
Reputation: 423
Text file example
text [21-03-1991] 1 Aufgrund O O 2 des O O # text [22-04-1991] 1 Aber O P text [21-04-1992] 2 Aufgrund O O 3 des O O # text [22-04-1992] 1 Aber O P text [21-06-1993] 3 Aufgrund O O 5 des O O # text [22-04-1993] 1 Aber O P
import pandas as pd
# Reading tab separated text file
df = pd.read_csv("source.txt",sep='\t')
df1 = df.iloc[:,[3,4,5]]
df1.columns = ['V1','V2','V3']
df2 = df.iloc[:,[7,8,9]]
df2.columns = ['V1','V2','V3']
df3 = df.iloc[:,[14,15,16]]
df3.columns = ['V1','V2','V3']
d_one = df1.append(df2, ignore_index=True)
final_df = d_one.append(df3, ignore_index=True)
print(final_df)
Upvotes: 1