Reputation: 490
I have been given some data in a text format that I would like to convert into a dataframe:
text <- "
VALUE Ethnic
1 = 'White - British'
2 = 'White - Irish'
9 = 'White - Other'
;
"
I'm looking to convert into a dataframe with a column for the first number and a column for the test in the string. So - in this case, it would be two columns and three rows.
Upvotes: 2
Views: 47
Reputation: 160
years_list = list(range(1986,2020))
columns_width = [(0,2),(2,10),(10,12),(12,24),(24,27),(27,39),(39,49),(49,52),(52,56),(56,69),(69,82), (82,95),(95,108),(108,121),(121,134),(134,147),(147,152),(152,170),(170,188),(188,201), (201,202),(202,210),(210,217),(217,230),(230,242),(242,245)]
columns_header = ['Register Type','Trading Date','BDI Code','Negociation Code','Market Type','Trade Name', 'Specification','Forward Market Term In Days','Currency','Opening Price','Max. Price', 'Min. Price','Mean Price','Last Trade Price','Best Purshase Order Price', 'Best Purshase Sale Price','Numbor Of Trades','Number Of Traded Stocks', 'Volume Of Traded Stocks','Price For Options Market Or Secondary Term Market', 'Price Corrections For Options Market Or Secondary Term Market', 'Due Date For Options Market Or Secondary Term Market','Factor Of Paper Quotatuion', 'Points In Price For Options Market Referenced In Dollar Or Secondary Term', 'ISIN Or Intern Code ','Distribution Number']
years_concat = pd.DataFrame()
for year in years_list:
time_serie = pd.read_fwf('/kaggle/input/bmfbovespas-time-series-19862019/COTAHIST_A'+str(year)+'.txt',
header=None, colspecs=columns_width)
# delete the first and the last lines containing identifiers
# use two comented lines below to see them
# output = pd.DataFrame(np.array([time_serie.iloc[0],time_serie.iloc[-1]]))
# output
time_serie = time_serie.drop(time_serie.index[0])
time_serie = time_serie.drop(time_serie.index[-1])
years_concat = pd.concat([years_concat,time_serie],ignore_index=True)
years_concat.columns = columns_header
Upvotes: 0
Reputation: 886938
library(tidyr)
library(dplyr)
tibble(text = trimws(text)) %>%
separate_rows(text, sep = "\n") %>%
filter(text != ";") %>%
slice(-1) %>%
separate(text, into = c("VALUE", "Ethnic"), sep = "\\s+=\\s+")
-output
# A tibble: 3 × 2
VALUE Ethnic
<chr> <chr>
1 1 'White - British'
2 2 'White - Irish'
3 9 'White - Other'
Or in base R
read.table(text = gsub("=", " ", trimws(text,
whitespace = "\n(;\n)*"), fixed = TRUE), header = TRUE)
VALUE Ethnic
1 1 White - British
2 2 White - Irish
3 9 White - Other
Upvotes: 1