Practical-4
Aim: Visual programming with orange tool.
Theory:
Visual programming
Visual
programming is a type of programming language that lets
humans describe processes using illustration. To create a to-do list with a visual
programming tool, the programmer draws out the flow of
the app. The resulting flowchart describes screens, user interactions, and what
happens to the data at each stage.
Data Sampler:
Inputs
- Data: input dataset
- Data Sample: sampled data instances
- Remaining Data: out-of-sample data
Information on the
input and output dataset.
· The desired sampling method:
· The desired sampling method:
- Fixed proportion of data returns a chosen percentage of the
entire data (e.g. 70% of all the data)
- Fixed sample size returns a selected number of data instances with
a chance to set Sample with replacement, which always samples from the
entire dataset .
- Cross Validation partitions data instances into the specified
number of complementary subsets. Following a typical validation schema,
all subsets except the one selected by the user are output as Data
Sample, and the selected subset goes to Remaining Data. (Note: In older
versions, the outputs were swapped. If the widget is loaded from an older
workflow, it switches to compatibility mode.)
- Bootstrap infers the sample from the population statistic.
- Press Sample Data to output the data sample.
Example
·
Fixed Sample Size
- First, let’s see how
the Data Sampler works. We will use the Insurance dataset from the File widget.
- We sampled the data
with the Data Sampler widget
- We chose to go with a
fixed sample size of 5 instances.
- We can observe the sampled
data in the Data Table widget.
- The second Data Table(out of sample) shows the remaining instances that weren’t in the sample. To output the out-of-sample data, double-click the connection between the widgets and rewire the output to Remaining Data –> Data.
Fixed Sample Size(Split into Train and Test)
- Now, we will use the Data Sampler to split the data into training and testing part. We are using the SuperStore dataset , which we loaded with the File widget.
- In Data Sampler, we split the data with Fixed proportion of data, keeping 70% of data instances in the sample.
- Then we connected one output contain 70% data to the Test and Score and the remaining data output to the Test & Score widget, Data Sample –> Data and Remaining Data –> Test and score. And then we add Logistic Regression as a learner, Logistic Regression –> Test and score. This runs logistic regression on the Data input and evaluates the results on the Test Data.
Cross Validation
-
The most typical procedure is cross validation, which splits the data
into k folds and uses k – 1 folds for training and the remaining fold
for testing.
- This procedure is repeated, so that each fold has been used for testing exactly once.
- Now, we will use the Data Sampler to split the data into training and testing part. We are using the SuperStore dataset, which we loaded with the File widget.
- In Data Sampler, we split the data with cross validation, keeping 10 used subset in the sample.
- Then we connected Data sampler –> Test and score. And then we add Logistic Regression as a learner, Logistic Regression–> Test and score.
- As our data set is small we need to reduce the number of folds, it depends on the size of the dataset.

Comments
Post a Comment