Accessing Files and Other Resources

Overview

A resource is an abstract term for files, Web pages, or anything else with which your applications could exchange data. The Curl® Runtime Environment (RTE) makes the ways you access these different resources as similar as possible, so you can focus more on how your program processes data, rather than on how you obtain and store it.

The runtime lets you read data from and write data to resources. In addition, you can manipulate certain data resources; for example, deleting or renaming files in a file system.

You need to identify a resource before you can read from it, write to it, or manipulate it. The following section explains how resources are identified. Subsequent sections explain how to interact with the resources that you have identified.

The Curl API also enables you to work with data from external data sources such as data files and databases. See Managing Data from External Sources

Identifying Resources with URLs

Summary:

Universal Resource Identifiers (URIs) identify resources.
The Curl RTE primarily uses URLs, a specific type of URI.
Use of URLs makes accessing remote data easy.

Universal Resource Identifiers (URIs) are used to name resources such as files in the file system or on Web servers. The Curl RTE uses a specific form of URI, the Universal Resource Locator (URL). URLs are probably familiar to you already, as they are used by Web browsers to identify Web pages you want to view.

Here are some examples of URLs:

http://www.example.com/scripts/example.cgi?searchstring=pandas#results identifies a file located on a Web server. In addition, it specifies a query to be passed to the file, and an anchor that identifies a particular location within the file.
file:///c:/applets/tutorial/start.curl names a file in the client's file system.

Since the runtime uses URLs to access resources, accessing data located on a remote server is as easy as accessing a file in the local file system.

Anatomy of a URL

A URL consists of up to four parts (listed in left-to-right order):

The scheme declaration, which is a string of letters followed by a colon. This is the top-level identification that specifies the protocol used to access the resource. For example, http: is the scheme identifier for resources accessible via the Hypertext Transfer Protocol (HTTP), which are files located on a Web server.
The path that names a specific resource within the scheme. This path often starts with a double slash (//) to mark the root of the directory hierarchy. However, there are schemes (such as mailto:) which do not use slashes.

Elements of a path are separated by path separator characters. Within the runtime this character is usually the forward slash (/), regardless of the character used by the native file system.
An optional query string, which starts with a question mark (?). This string usually contains parameters in ampersand (&) separated identifier=value pairs. For example:

?keyword=pandas&case-sensitive=false

Specific applications may use a different query string format. Also, it is up to the server to interpret the query string. It can just ignore it entirely.

These parameters specify some sort of action to be taken by the resource, such as parameters for a server-side script.

Not all schemes recognize query strings. If a scheme does not recognize a query string, it is usually treated as part of the path. The list of recognized schemes below explain which schemes recognize query strings.
An optional anchor, which identifies some location within the resource. Anchors begin with a pound sign (#) followed by a string. They mark a location in the resource that should be displayed.

As with query strings, not all schemes recognize anchors. If a scheme doesn't, any anchor in the URL is usually treated as if it were part of the path.

Standard Schemes Supported by the Runtime

The standard schemes that the runtime supports are:

file: which names files in the file system. The path portion of a file: URL consists of three parts:
- a double slash (//)
- the name of the system on which the file system is located, followed by a path separator character
  
  Note: In this release, the file: scheme does not support system names.
- a path within the file system to the file
If the name of the system is missing (for example, file:///file.txt) then the URL names a file in the local file system.

The file: scheme supports anchors and queries.

On Windows systems, the first portion of the path within the file system is usually the drive letter followed by a path separator (such as c:/). For example:

file:///c:/windows/system32/somelib.dll

Shared directories on Windows networks are accessible by URLs which start with double slashes, followed by the name of the server, then the name of the shared directory, such as:

file:////ServerName/ShareName/somefile.txt
http: which names files on Web servers. The Curl RTE supports the standard components of http: URLs, including query strings and anchors. Resources available via the http: scheme are read-only.
https: which is identical to http: except that a secure socket layer (SSL) connection is used to transfer data from the Web server. SSL connections encrypt data sent over them, to prevent interception and verify identity.

Any schemes that are not recognized by the runtime can be stored in the Url class (see below). However, most manipulations of a URL that uses an unsupported scheme will result in an exception being thrown.

The curl: Scheme

In addition to the standard schemes listed above, the runtime gives you access to Curl language-specific resources via the curl: scheme. Within the curl: scheme, the top-level directory determines the type of resource being accessed. The directory curl://string allows you to create string files that let you turn a string into stream, as if it were being read from a file. See the String Files section for more information.

The directories curl://offline/, curl://root/, curl://http-root/, and curl://local-data are all used in Occasionally Connected Computing. See Occasionally Connected Computing. The curl://edit/ scheme enables you to invoke the Source Editor programmatically.

Other directories, such as curl://source/ and curl://install/ are reserved for Curl internal use. Users should not use them unless specifically instructed to do so.

Using URLs

Summary:

The Url class represents URLs.
For access to local files, use PrivilegedUrls obtained from the standard file dialogs.
Urls let you manipulate the URLs they represent.

The Url class represents URLs in the Curl language. Most methods and procedures in the Curl RTE that need to access files require that the files be identified using Url objects. This class also lets you extract specific portions of the URL that a Url object represents.

Creating instances of these classes is explained in the following sections.

Security and File Access

For the user's protection, the Curl RTE limits what applets can do (unless they are running in a special privileged mode). See the Security chapter for an overview of security features of the Curl RTE.

The one important difference between accessing files on a Web site and accessing files locally is the different security restrictions the runtime places on each type of access:

The runtime only allows applet to access files from a Web site using a URL with the http: or https: scheme if the Web site has agreed to let applets access it. See Configuring Your Web Server for an explanation of how Web sites allow applets to access their files. There is an exception if the user tells the applet it can access a file on a Web server. See Web Access Restrictions and choose-location
The runtime only allows an applet to access local files if the user has approved the access. An applet gets this approval by calling one of the standard file dialog boxes. See below for details.

Creating Urls for Web Site Resources

Your applets will often want to access a Web site to get a data file, an image, or some other resource. The easiest way to create a Url object that represents such a resource is to use the url primitive. This primitive takes a string containing the URL of the resource and returns a Url object. Here is an example of how to create a Url object that represents a web page:

{let myurl:Url = {url "http://www.example.com/index.html"}}

Having a Url object that represents a file on a Web site does not mean that the file exists or that your applet can access it. It is not until your applet tries to access the resource in some manner, such as by opening a data stream to or from it, that the runtime determines if the applet can access the file. Applets may not be able to access a file for several reasons: the Web site named in the URL may not exist or be available, the file may not exist, or the runtime may determine that the applet is not allowed to access the file.

Passing Urls to Methods and Procedures

Most of the methods and procedures that require you to identify a file accept a Url as an argument. So, even if you do not wish to access a resource directly, you may need to instantiate a Url object to pass to one of these methods or procedures.

For example, the source parameter of the image text format takes a Url to specify the image file to display. You don't need to separately instantiate a Url object and then pass it to image. Instead, you usually just embed the url expression within the image call:

{image source={url "http://www.example.com/imgs/logo.gif"}}

Relative URLs

URLs that spell out an entire path, including a scheme (such as http://www.example.com/imgs/logo.gif) are absolute. They provide all of the information needed to locate the resource. Always giving a full URL for a resource not only increases the amount you have to type, but also makes your documents and programs less portable. If you change the location of the resources on your Web site (such as move some files to a new directory), you will need to go through your source file and change every URL.

Instead, You can give the url expression a relative path: one in which the location of the resource is specified relative to the location of the applet's source file on the Web site. Relative URLs can use the following standard notation for relative paths:

Notation	Description
..	References the parent directory of the current directory. For example, the relative path ../somedir/somefile.txt tells the runtime to locate the file somefile.txt by moving up to the parent directory, then down to the somedir subdirectory.
.	Refers to the current directory. The relative paths ./somefile.txt and somefile.txt are the same.
/	A path separator at the beginning of a relative path means that the relative path starts at the root of the file system.

One common use of a relative path is to name a file in the same directory as the source file being elaborated. Since the file is in the same directory, the relative path consists of just the name of the file.

Example: Using a Relative Path

{image source={url "../../default/images/generic.gif"}}

Ensuring Absolute URLs

If you want to ensure that a String containing a URL is absolute, you can use the abs-url? procedure to test it. This procedure returns true if the URL appears to be absolute.

In some cases, for security reasons, you may want to enforce the use of an absolute rather than a relative URL. In this case, you can use the abs-url in place of url to translate a String into a Url. When given an absolute URL, abs-url works exactly like url. However, abs-url will throw an Error if given a String containing a relative URL.

If you are sure a String contains an absolute URL (URLs that you hard-code in your pages, for example), you should use abs-url rather than url to convert it to a Url, as it is more efficient.

Creating Url Objects for Local Files

Since users must give permission for applets to access a file on the local file system, instantiating a Url for local files is different from doing so for files on Web sites. Your applet needs to ask the user to select a file for it to access by calling choose-file, choose-multiple-files, or choose-location. These procedures open a dialog that lets the user select a file (or several, in the case of choose-multiple-files). When calling these procedures, your applet states the type of access it wants to the file (read only, read and write, or create a new file). See Creating File Dialogs for an explanation of using the standard file dialogs.

The following example shows you how to call the choose-file procedure to have the user select a single file. It displays the URL of the selected file using the Url.name accessor which is explained in Getting Information about a Url.

Example: Calling choose-file to Get Local File Url Objects

{let output-area:TextDisplay = {TextDisplay width=3in}}

{VBox
    halign="right",
    output-area,
    {CommandButton
        label="Choose File",
        {on Action do
            {let file-url:#Url =
                {choose-file
                    style=FileDialogStyle.edit,
                    title="Select a File to Edit"
                }
            }
            {if-non-null file-url then
                || User selected a file Use it somehow.
                set output-area.value = file-url.name
            }
        }
    }
}

The standard file-selection dialogs return a subclass of Url called PrivilegedUrl if the user selected a file (it returns null if the user canceled the file dialog). Since the only way an applet can get a PrivilegedUrl is through the standard file selection dialogs, the Curl runtime can allow the applet to access the file. Even though the file selection dialogs return a PrivilegedUrl object, you should treat it as a Url.

Web Access Restrictions and choose-location

The choose-location file dialog box lets the user enter a URL directly, in addition to browsing for a file. This means the user could enter the URL for a file located on a Web site. If the user does enter a URL for a Web site file, then the Curl runtime allows the applet to access that file, regardless of whether the Web site has a curl-access.txt that grants the applet access.

So, if your applet needs to access files on Web sites which may not have a curl-access.txt file, it can ask the user to manually enter the URL of the file.

Character Encoding in URLs

Certain characters, especially '%', '?', and '#', have special meaning in a URL, and to bypass that special meaning, those characters must be encoded. Some procedures, such as parse-url, automatically encode such characters when it seems appropriate. But the string you pass to url or abs-url, or to a variety of methods on Url, must have already had such characters encoded, either by hand, or using url-encode-filename, if appropriate. The url-encode-string procedure can be used to encode other, less dangerous, characters as well as those which must be encoded for correctness. You can use this procedure directly on strings that may contain unsafe characters, such as strings input by an end user, that you are using to build a URL, especially when building up a query string.

Getting Information about a Url

Summary:

The Url class provides accessors to isolate parts of the URL.
The Url class also contains methods to return a clone of itself, with one portion altered.

The Url class provides accessors that let you extract portions of the URL that the Url represents. The following diagram shows which portion of a URL some of the accessors return:

Figure: The Parts of a URL

These accessors are:

Accessor	Description
anchor	Returns a String containing the anchor portion of the Url, which identifies a location within the resource. If there is no anchor in the URL, this accessor returns the empty string.
basename	Returns a String containing everything before the rightmost period in the Url's file name. If there is no period in the filename, then the entire file name is returned.
extension	Returns a String containing the extension portion of the file named by the Url. Usually, the extension is the rightmost period (.) and everything to the right of it in the file name. If the file name does not contain a period, then the empty string is returned.
filename	Returns a String containing the name of the file (including the extension) in the Url. If the Url does not name a file (for example, the Url's path ends or a directory, and contains a trailing slash (/)), filename returns the empty string.
full-filename	Returns a String containing the scheme followed by the path of the Url. In other words, this is the full URL, minus the query and anchor.
leaf	Returns a String that contains the path in the file system of the resource. The scheme, name of the system the resource is located on, anchor, and query string are removed.
local-filename	Returns a String containing the path in the local file system to the file or directory identified by the Url. The format of the path is in the native syntax of the local file system. For example, on a Windows system, local-filename returns a path separated by \ rather than the standard /. If the Url represents a file that is not in the local file system (for example, if its scheme is http:), then local-filename returns a null.
name	Returns the full URL that the Url represents as a String.
parent-dir	Returns a Directory object for the directory that contains the file specified by the Url. See the section on the Directory Objects for more information.
parent-dir-name	The same as parent-dir but returns its results in a String rather than a Directory.
pathname	Returns a String containing the URL represented by the Url minus any anchor.
pathname-tail	Returns a String containing the full file name plus the query string (if any) from the Url.
query	Returns a String containing the Url's query string. If the Url does not contain a query string, then query returns the empty string.
separator	Returns a String that contains the Url's separator character. This is usually a slash (/).
stem	Returns a Directory object that represents the scheme and system name in the Url. See the Directory Objects section for more information.

The following example demonstrates how these accessors interpret a URL. You can enter a valid URL (or just accept the one already entered) in the URL To Analyze box and click Analyze URL to see what the accessors return. The URL you enter does not need to actually resolve to anything, since this example doesn't try to access the resource named by the URL. No error checking is performed, so analyzing an invalid URL may result in an error. Also note that you can enter a relative URL in URL To Analyze.

Example: Dissecting a URL

{value
    let uribox:TextField =
        {TextField width=4in,
            value="http://www.example.com/scripts/example.cgi?search=yes#myanchor"}
    let results:Graphic = {Fill}
    let analyze:CommandButton =
        {CommandButton
            label = "Analyze URL",
            {on Action do
                || resolve the URL.
                let theurl:Url = {url uribox.value}
                set results =
                    {results.replace-with
                        {spaced-vbox
                            {text name is: {value theurl.name}},
                            {text anchor is: {value theurl.anchor}},
                            {text basename is: {value theurl.basename}},
                            {text extension is: {value theurl.extension}},
                            {text filename is: {value theurl.filename}},
                            {text full-filename is: {value theurl.full-filename}},
                            {text leaf is: {value theurl.leaf}},
                            {text local-filename is: {value theurl.local-filename}},
                            {text parent-dir-name is: {value theurl.parent-dir-name}},
                            {text pathname is: {value theurl.pathname}},
                            {text pathname-tail is: {value theurl.pathname-tail}},
                            {text query is: {value theurl.query}},
                            {text separator is: {value theurl.separator}}}}}}

    {spaced-vbox
        {spaced-hbox {text URL To Analyze: }, uribox, analyze},
        results}
}

Copyright © 1998-2019 SCSK Corporation. All rights reserved.
Curl, the Curl logo, Surge, and the Surge logo are trademarks of SCSK Corporation. that are registered in the United States. Surge Lab, the Surge Lab logo, and the Surge Lab Visual Layout Editor (VLE) logo are trademarks of SCSK Corporation.