Quantcast
Channel: The SAS Dummy
Viewing all 234 articles
Browse latest View live

Computing a date from the past (or future) with SAS

$
0
0

Social media has brought anniversary dates to the forefront. Every day, my view of Google Photos or Facebook shows me a collection of photos from exactly some number of years ago to remind me of how good things were back then. These apps are performing the simplest of date-based math to trick me. They think, "We have a collection of photos from this month/day from some previous year -- let's collect those together and then prey on Chris' nostalgia instincts."

Sometimes it works and I share them with friends. Weren't my kids so cute back then? It's true, they were -- but I don't pine for those days. I live in the present and I look towards the future -- as all citizens of the world should.

Looking back at dates in SAS

But you know who likes to look at the past? Managers (and probably a few of my colleagues...). Many of my SAS jobs are in support of date-based reports. They answer questions such as, "what happened in the past 30 days?" or "how much activity in the past 6 months?" When I have SAS date values, going back 30 days is simple. The SAS date for "30 days ago" is simply today()-30. (Because, remember, a SAS date is simply an integer representing the count of days since Jan 1, 1960.)

"6 months ago" is a little more nuanced. I could fudge it by subtracting 183 or 184 from the value of today(), but that's not precise enough for the analytical wonks that I work with. Fortunately, SAS provides a function that can compute any date -- given a starting date -- using just about any criteria you can imagine. The function is called INTNX (link to INTNX function doc). Here's an example that computes the date from 6 months ago:

six_mo_ago = intnx(
 'month',  /* unit of time interval */
  today(), /* beginning date */
  -6,      /* number of intervals, negative goes to the past */
  'same'   /* alignment of interval date. "Same" is for same day of month */
 );

If I wanted to go 6 years into the past, I would simply change that first argument to 'year'. 6 days? Change it to 'day'. Here's the (very long) list of built-in intervals that INTNX supports.

Using INTNX in SAS macro language

The form I showed above works with DATA step, but I usually capture this value in a macro variable so that I can reference it across different procedures without having to recompute it. To express this in the SAS macro language, I need to wrap those two function calls (for the TODAY function and the INTNX function) in %SYSFUNC -- the macro function that breaks out of macro processing to invoke built-in SAS functions. I also need to remove the quotes around the interval and alignment values -- the SAS macro processor will treat these as string literals and would not approve of the quotes.

%let six_mo_ago = 
 %sysfunc(
  intnx(
   month,             /* unit of time interval */
   %sysfunc(today()), /* function to get current date */
   -6,                /* number of intervals, negative goes to the past */
   same               /* alignment of interval date. "Same" is for same day of month */
   )
  );
 %put &=six_mo_ago;

Output (from today's run):

45          %put &=six_mo_ago;
SIX_MO_AGO=20960

Notice the result is a number -- which is exactly what I need for date value comparisons. But what if I wanted a formatted version of the date? The %SYSFUNC macro function has an optional second argument that accepts a SAS format name, allowing me to create a value for display.

 %let six_mo_ago_fmt = 
  %sysfunc(
   intnx(
   month,             /* unit of time interval */
   %sysfunc(today()), /* function to get current date */
   -6,                /* number of intervals, negative goes to the past */
   same               /* alignment of interval date. "Same" is for same day of month */
   ), date9.  /* Tell %SYSFUNC how to format the result */
  );
 %put &=six_mo_ago_fmt;

Output (from today's run):

56          %put &=six_mo_ago_fmt;
SIX_MO_AGO_FMT=21MAY2017

Adapting INTNX for SAS datetime values

This computed date works perfectly when my data sets contain SAS date values that I want to filter. For example, I can limit the records to those from the past 6 months with code similar to this:

proc freq data=comm.visits (where=(date > &six_mo_ago.)) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

But I realized that some of my data sets use datetime values, not date values. A SAS datetime value is simply the number of seconds since midnight on Jan 1, 1960. I've seen programs that adapt the code above by converting the computed cutoff date from a date value to a datetime value -- it's simple to do with math:

proc freq data=comm.visits (where=(event_time > %sysevalf( &six_mo_ago. * 60 * 60 * 24 ))) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

The %SYSEVALF function tells the SAS macro processor to evaluate the mathematical expression within the parenthesis. Multiplying the SAS date value by the number of seconds in a day (60 seconds * 60 minutes-per-hour * 24 hours-per-day) results in the equivalent datetime value. (Where have I seen such crazy code? Certainly not in my own SAS programs -- no sir. Never happened.)

But this is SAS -- there is an easier way (again) with the INTNX function. INTNX can process datetime values too, and supports special datetime intervals. In this case, all I need to do is swap in the 'dtmonth' interval for 'month', and the DATETIME() function for TODAY():

%let six_mo_ago_dt = 
 %sysfunc(
  intnx(
   dtmonth,              /* unit of time interval */
   %sysfunc(datetime()), /* function to get current datetime */
   -6,                   /* number of intervals, negative goes to the past */
   same                  /* alignment of interval date. "Same" is for same day of month/same time */
   )
  );
 %put &=six_mo_ago_dt;

Output (at this moment):

34          %put &=six_mo_ago_dt;
SIX_MO_AGO_DT=1810991917.93526

I can then use the much more readable version of my comparison code:

proc freq data=comm.visits (where=(event_time > &six_mo_ago_dt)) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

Learning more about date and datetime intervals

First, an acknowledgment. I recently had the privilege of presenting at the Quebec user groups with SAS-world superstar Marje Fecht. Marje is an excellent instructor, and she delivered a popular talk about working with SAS date values -- a topic that trips up many new SAS users. It's useful for me -- someone who has used SAS for a long time -- to see a basic topic presented to me as if I were brand new. Her talk reminded me of some good practices that I was able to bring home and apply in my own production SAS jobs. I know that Marje was invited to present this talk at SAS Global Forum 2018, and I hope that I did not steal too much of her thunder. She has a good origin story about the 01JAN1960 date decision. You should attend the conference and see her talk in person.

The INTNX (and its sister function for computing date differences, INTCK) are powerful tools for manipulating date and datetime values. Other programming languages offer complex code libraries to accomplish what these two functions can do as part of Base SAS. They are tricky to learn at first, but once you get the hang of them they can really simplify your SAS programs that deal with time-based data.

Rick Wicklin presented a useful introduction to both functions in INTCK and INTNX: Two essential functions for computing intervals between dates in SAS.

While these functions are available in Base SAS, they are maintained by the developers who look after SAS/ETS (econometrics and time series). You'll find comprehensive usage information in the SAS/ETS documentation. The functions are incredibly important, as they are often used to calculate dates and times that are used in legal and financial applications. Those industries take their dates/times very seriously. For a fun example of how deep it can get, see this thread on the SAS Support Communities.

The post Computing a date from the past (or future) with SAS appeared first on The SAS Dummy.


How to scrape data from a web page using SAS

$
0
0

The internet is rich with data, and much of that data seems to exist only on web pages, which -- for some crazy reason -- are designed for humans to read. When students/researchers want to apply data science techniques to analyze collect and analyze that data, they often turn to "data scraping." What is "data scraping?" I define it as using a program to fetch the contents of a web page, sift through its contents with data parsing functions, and save its information into data fields with a structure that facilitates analysis.

Python and R users have their favorite packages that they use for scraping data from the web. New SAS users often ask whether there are similar packages available in the SAS language, perhaps not realizing that Base SAS is already well suited to this task -- no special bundles necessary.

The basic steps for data scraping are:

  1. Fetch the contents of the target web page
  2. Process the source content of the page -- usually HTML source code -- and parse/save the data fields you need.
  3. If necessary, repeat for subsequent pages. This applies to those web sites that serve up lots of information in paginated form, and you want to collect all available pages of data.

Let's map these steps to the SAS programming language:

For this step Use these features
Get the contents of the web page PROC HTTP or FILENAME URL
Process/parse the web page contents DATA step, with parsing functions such as FIND, SCAN, and regular expressions via PRXMATCH. Use SAS informats to convert text to native data types.
Repeat across subsequent pages SAS macro language (%DO %UNTIL processing) or DATA step with CALL EXECUTE to generate multiple iterations of the fetch/parse steps.

 

In the remainder of this article, I'll dive deeper into the details of each step. As an example, I'll present a real question that a SAS user asked about scraping data from the Center for Disease Control (CDC) web site.

Step 0: Find the original data source and skip the scrape

I'm writing this article at the end of 2017, and at this point in our digital evolution, web scraping seems like a quaint pastime. Yes, there are still some cases for it, which is why I'm presenting this article. But if you find information that looks like data on a web page, then there probably is a real data source behind it. And with the movement toward open data (especially in government) and data-driven APIs (in social media and commercial sites) -- there is probably a way for you to get to that data source directly.

In my example page for this post, the source page is hosted by CDC.gov, a government agency whose mandate is to share information with other public and private entities. As one expert pointed out, the CDC shares a ton of data at its dedicated data.cdc.gov site. Does this include the specific table the user wanted? Maybe -- I didn't spend a lot of time looking for it. However, even if the answer is No -- it's a simple process to ask for it. Remember that web sites are run by people, and usually these people are keen to share their information. You can save effort and achieve a more reliable result if you can get the official data source.

If you find an official data source to use instead, you might still want to automate its collection and import into SAS. Here are two articles that cover different techniques:

Web scraping is lossy, fragile process. The information on the web page does not include data types, lengths, or constraints metadata. And one tweak to the presentation of the web page can break any automated scraping process. If you have no other alternative and you're willing to accept these limitations, let's proceed to Step 1.

Step 1: Fetch the web page

In this step, we want to achieve the equivalent of the "Save As..." function from your favorite web browser. Just like your web browser, SAS can act as an HTTP client. The two most popular methods are:

  • FILENAME URL, which assigns a SAS fileref to the web page content. You can then use INFILE and INPUT to read that file from a DATA step.
  • PROC HTTP, which requires a few more lines of code to get and store the web page content in a single SAS procedure step.

I prefer PROC HTTP, and here's why. First, as a separate explicit step it's easier to run just once and then work with the file result over the remainder of your program. You're guaranteed to fetch the file just once. Though a bit more code, it makes it more clear about what happens under the covers. Next reason: PROC HTTP has been refined considerably in SAS 9.4 to run fast and efficient. Its many options make it a versatile tool for all types of internet interactions, so it's a good technique to learn. You'll find many more uses for it.

Here's my code for getting the web page for this example:

/* Could use this, might be slower/less robust */
* filename src url "https://wwwn.cdc.gov/nndss/conditions/search/";
 
/* I prefer PROC HTTP for speed and flexibility */
filename src temp;
proc http
 method="GET"
 url="https://wwwn.cdc.gov/nndss/conditions/search/"
 out=src;
run;

Step 2: Parse the web page contents

Before you can write code that parses a web page, it helps to have some idea of how the page is put together. Most web pages are HTML code, and the data that is "locked up" within them are expressed in table tags: <table>, <tr>, <td> and so on. Before writing the first line of code, open the HTML source in your favorite text editor and see what patterns you can find.

For this example from the CDC, the data we're looking for is in a table that has 6 fields. Fortunately for us, the HTML layout is very regular, which means our parsing code won't need to worry about lots of variation and special cases.

 <tr >
	<td style="text-align:left;vertical-align:middle;">
    	<a href="/nndss/conditions/chancroid/">
		Chancroid
		</a>
	</td>
	<td class="tablet-hidden"></td>
	<td class="tablet-hidden"></td>
	<td class="tablet-hidden"><i>Haemophilus ducreyi</i></td>
	<td  >1944 </td>
	<td  > Current</td>
</tr>

The pattern is simple. We can find a unique "marker" for each entry by looking for the "/nndss/conditions" reference. The line following (call it offset-plus-1) has the disease name. And the lines that are offset-plus-7 and offset-plus-8 contain the date ranges that the original poster was asking for. (Does this seem sort of rough and "hacky?" Welcome to the world of web scraping.)

Knowing this, we can write code that does the following:

  • Reads all of the text lines into a data set and eliminates the blank lines, so we can have a reliable offset calculation later. For this, I used a DATA step with INFILE, using the LENGTH indicator and a simple IF condition to skip the blank lines.
  • Finds each occurrence of the "/nndss/conditions/" token, our cue for the start of a data row we want. I used the FIND function to locate these. If this was a more complex pattern, I would have used PRXMATCH and regular expressions.
  • "Read ahead" to the +1 line to grab the disease name, then to the +8 line to get the end of the date range (which is what the original question asked for). For these read-ahead steps, I used the SET statement with the POINT= option to tell the DATA step to read the data file again, but skip ahead to the exact line (record) that contains the value we need. This trick is one way to simulate a "LEAD" or look-ahead function, which does not exist in the SAS language. (Here's an article that shows another look-ahead method with the MERGE statement.)

Here's the code that I came up with, starting with what the original poster shared:

/* Read the entire file and skip the blank lines */
/* the LEN indicator tells us the length of each line */
data rep;
infile src length=len lrecl=32767;
input line $varying32767. len;
 line = strip(line);
 if len>0;
run;
 
/* Parse the lines and keep just condition names */
/* When a condition code is found, grab the line following (full name of condition) */
/* and the 8th line following (Notification To date)                                */
/* Relies on this page's exact layout and line break scheme */
data parsed (keep=condition_code condition_full note_to);
 length condition_code $ 40 condition_full $ 60 note_to $ 20;
 set rep;
 if find(line,"/nndss/conditions/") then do;
   condition_code=scan(line,4,'/');
   pickup= _n_+1 ;
   pickup2 = _n_+8;
   /* "read ahead" one line */
   set rep (rename=(line=condition_full)) point=pickup;
   condition_full = strip(condition_full);
 
   /* "read ahead" 8 lines */
   set rep (rename=(line=note_to)) point=pickup2;
   /* use SCAN to get just the value we want */
   note_to = strip(scan(note_to,2,'<>'));
 
   /* write this record */
   output;
  end;
run;

The result still needs some cleanup -- there are a few errant data lines in the data set. However, it should be simple to filter out the records that don't make sense with some additional conditions. I left this as a to-do exercise to the original poster. Here's a sample of the result:

Step 3: Loop and repeat, if necessary

For this example with the CDC data, we're done. All of the data that we need appears on a single page. However, many web sites use a pagination scheme to break the data across multiple pages. This helps the page load faster in the browser, but it's less convenient for greedy scraping applications that want all of the data at once. For web sites that paginate, we need to repeat the process to fetch and parse each page that we need.

Web sites often use URL parameters such as "page=" to indicate which page of results to serve up. Once you know the URL parameter that controls this offset, you can create a SAS macro program that iterates through all of the pages that you want to collect. There are many ways to accomplish this, but I'll share just one here -- and I'll use a special page from the SAS Support Communities (communities.sas.com) as a test.

Aside from the macro looping logic, my code uses two tricks that might be new to some readers.

/* Create a temp folder to hold the HTML files */
options dlcreatedir;
%let htmldir = %sysfunc(getoption(WORK))/html;
libname html "&htmldir.";
libname html clear;
 
/* collect the first few pages of results of this site */
%macro getPages(num=);
 %do i = 1 %to &num.;
   %let url = https://communities.sas.com/t5/custom/page/page-id/activity-hub?page=&i.;
   filename dest "&htmldir./page&i..html";
   proc http 
      method="GET"
	  url= "&url."
	  out=dest;
   run;
 %end;
%mend;
 
/* How many pages to collect */
%getPages(num=3);
 
/* Use the wildcard feature of INFILE to read all */
/* matching HTML files in a single step */
data results;
 infile "&htmldir./*.html" length=len lrecl=32767;
 input line $varying32767. len ;
 line = strip(line);
 if len>0;
run;

The result of this program is one long data set that contains the results from all of the pages -- that is, all of lines of HTML code from all of the web pages. You can then use a single DATA step to post-process and parse out the data fields that you want to keep. Like magic, right?

Where to learn more

Each year there are one or two "web scraping" case studies presented at SAS Global Forum. You can find them easily by searching on the keyword "scrape" from the SAS Support site or from lexjansen.com. Here are some that I found interesting:

How JMP 13 sees the CDC page

If you have access to JMP software from SAS, you could try the File->Internet Open... feature. It's useful for a one-time, ad-hoc parsing step that you can later capture into a JSL script for future use. With Internet Open you can specify a URL and select to open it "as HTML," and then JMP will offer a selection of available tables to import as data. This is definitely worth a try if you have JMP on your desktop, as it can lend some insight about the structure of the page you're trying to scrape.

JMP 13 version of the data, imported

The post How to scrape data from a web page using SAS appeared first on The SAS Dummy.

Tricks for importing text files in SAS Enterprise Guide

$
0
0

I'm a big fan of the Import Data task in SAS Enterprise Guide, especially for its support of text-based files (CSV, tab delimited, fixed width, and more). There's no faster method for generating SAS code that reads your data exactly the way you need it. I use the tool so often that I take for granted some of its neatest features, and I forget that many new users (and even veteran users) might not know about them. In this article, I'll review a few of the cool things that this task can do for you.

Read fixed-width text files into SAS

We think of CSV files (and...alas...Excel files) as the main standard for data exchange among systems, but many legacy systems still produce and consume fixed-width text data formats. The SAS DATA step is a perfect tool for reading these files, but defining the columns and their properties can be tedious. The "Fixed columns" option on the Import Data task can make this job simple.

Suppose that you're beginning with a spec like this:

And a raw data file like this:

You can use the Import Data wizard to define the boundaries of your columns by adding boundary lines with just click-and-drag operations. Beginning with the File->Import Data task, select your source text file and advance to the second page of the wizard. When you select "Fixed columns" as the input text format, you'll see a layout ruler that looks like this:

Click at the column boundaries (referring to your original spec!) and drag the rule lines as needed to define those column boundaries. Then click Next, and fill out details for the column names and types:

Which then tells the Import Data task how to generate the proper INPUT statements:

When you click Finish, you end up with a data set that's ready for business:

Modify the properties for multiple columns -- with one step

Here's a click-saving trick. Sometimes you have an input data file that contains many columns that share the same properties: type, length, and SAS format. It can be tedious to click and modify the properties of each column that you want to import. There's a shortcut on the Define Field Attributes page of the wizard that you can use to change the attributes for several columns at the same time. Simply SHIFT+Click to select multiple column definitions on the page, then click Modify.... The "Field Attributes for Multiple Selections" window appears, and you can change the necessary attributes just once and apply to the many items you picked.

This trick works as you import any text file or Excel file.

Create SAS program code that you can reuse anywhere

In a previous article I described how the Import Data task works "behind the scenes." Some of the magic that the task performs is not captured in SAS code, and that can present a challenge when you want to reuse this work in other settings -- for example, in a batch process or in a larger SAS program. However, with a couple of tweaks you can coerce the Import Data task into creating SAS code that you can almost just "lift and shift," as is.

The first option is hidden under the Performance window, labeled as "Bypass the data cleansing process." By default, the Import Data task reformats your input text file to normalize it for a cleaner import step. While doing no harm, most of the time this step isn't needed -- especially if your original data file is well formed. And since this step changes the input file, it's isn't repeatable outside of this task. My first tip for the best reusable code: click Performance... on the first page of the wizard, then select the "Bypass.." checkbox. That guarantees that the code will be formulated to read your original raw file. (Note that the Performance button is available only when importing text files, not Excel files.)

The second option you'll want to change is related to this, but you'll find it on the final page with the Advanced Options. Select "Generalize import step to run outside of SAS Enterprise Guide." This ensures that the task won't attempt any behind-the-scenes monkey business with your original file -- everything is captured in the DATA step that the task generates. Well, almost everything...

The one missing piece, a confounding factor when you select a local text file to import on a remote SAS Workspace session, is the transfer of the local file to the remote server. SAS Enterprise Guide copies the file for you -- behind the scenes -- and there is no SAS code to represent this step.

You can take control of even this step, though, if you make use of the Copy Files task (now available for you on the Tasks->Data menu). You can then copy the file from a local source folder, and land it wherever you want on the SAS server. Modify your newly repurposed Import Data code to pull from that server-based destination, giving you more control over the individual steps in the import process.

Learn more about importing text files

If you're new to importing data into SAS, whether using a SAS program or SAS Enterprise Guide, you might learn some of the basics from these video tutorials that were produced by SAS instructors:

The post Tricks for importing text files in SAS Enterprise Guide appeared first on The SAS Dummy.

How to share your SAS knowledge with your professional network

$
0
0

As you might have heard, sasCommunity.org -- a wiki-based web site that has served as a user-sourced SAS repository for over a decade -- is winding down. This was a difficult decision taken by the volunteer advisory board that runs the site. However, the decision acknowledges a new reality: SAS professionals have many modern options for sharing and promoting their professional work, and they are using those options. In 2007, the birth year of sasCommunity.org, the technical/professional networking world was very different than it is today. LinkedIn was in its infancy. GitHub didn't exist. SAS Support Communities (communities.sas.com) was an experiment just getting started with a few discussion forums. sasCommunity.org (and its amazing volunteers) blazed a trail for SAS users to connect and share, and we'll always be grateful for that.

Even with the many alternatives we now have, the departure of sasCommunity.org will leave a gap in some of our professional sharing practices. In this article, I'll share some ideas that you can use to fill this gap, and to extend the reach of your SAS knowledge beyond just your SAS community colleagues. Specifically, I'll address how you can make the biggest splash and have an enduring impact with that traditional mode of SAS-knowledge sharing: the SAS conference paper.

Extending the reach of your SAS Global Forum paper

Like many of you, I've written and presented a few technical papers for SAS Global Forum (and also for its predecessor, SUGI). With each conference, SAS publishes a set of proceedings that provide perpetual access to the PDF version of my papers. If you know what you're looking for, you can find my papers in several ways:

All of these methods work with no additional effort from me. When your paper is published as part of a SAS conference, that content is automatically archived and findable within these conference assets. But for as far as this goes, there is opportunity to do so much more.

Write an article for SAS Support Communities

ArtC's presenter page

sasCommunity.org supported the idea of "presenter pages" -- a mini-destination for information about your conference paper. As an author, you would create a page that contains the description of your paper, links to supporting code, and any other details that you wanted to lift out of the PDF version of your paper. Creating such a page required a bit of learning time with the wiki syntax, and just a small subset of paper presenters ever took the time to complete this step. (But some prolific contributors, such as Art Carpenter or Don Henderson, shared blurbs about dozens of their papers in this way.) Personally, I created a few pages on sasCommunity.org to support my own papers over the years.

SAS Support Communities offers a similar mechanism: the SAS Communities Library. Any community member can create an article to share his or her insights about a SAS related topic. A conference paper is a great opportunity to add to the SAS Communities Library and bring some more attention to your work. A communities article also serves as platform for readers to ask you questions about your work, as the library supports a commenting feature that allows for discussion.

Since sasCommunity.org has announced its retirement plans, I took this opportunity to create new articles on SAS Support Communities to address some of my previous papers. I also updated the content, where appropriate, to ensure that my examples work for modern releases of SAS. Here are two examples of presentation pages that I created on SAS Support Communities:

One of my presentations on in the SAS Communities Library

When you publish a topic in the SAS Communities Library, especially if it's a topic that people search for, your article will get an automatic boost in visitors thanks to the great search engine traffic that drives the communities site. With that in mind, use these guidelines when publishing:

  • Use relevant key words/phrases in your article title. Cute and clever titles are a fun tradition in SAS conference papers, and you should definitely keep those intact within the body of your article. But reserve the title field for a more practical description of the content you're sharing.
  • Include an image or two. Does your paper include an architecture diagram? A screen shot? A graph or plot? Use the Insert Photos button to add these to your article for visual interest and to give the reader a better idea of what's in your paper.
  • Add a snippet of code. You don't have to attach all of your sample code with hundreds of program lines, but a little bit of code can help the reader with some context. Got lots of code? We'll cover that in the next section.

To get started with the process for creating an article...see this article!

Share your code on GitHub

SAS program code is an important feature in SAS conference papers. A code snippet in a PDF-style paper can help to illustrate your points, but you cannot effectively share entire programs or code libraries within this format. Code that is locked up in a PDF document is difficult for a reader to lift and reuse. It's also impossible to revise after the paper is published.

GitHub is a free service that supports sharing and collaboration for any code-based technology, including SAS. Anyone who works with code -- data scientists, programmers, application developers -- is familiar with GitHub at least as a reader. If you haven't done so already, it might be time to create your own GitHub account and share your useful SAS code. I have several GitHub repositories (or "repos" as we GitHub hipsters say) that are related to papers, blog posts, and books that I've written. It just feels like a natural way to share code. Occasionally a reader suggests an improvement or finds a bug, and I can change the code immediately. (Alas, I cannot go back in time and change a published paper...)

A sample of conference-paper-code on my GitHub.

List your published work on your LinkedIn profile

So, you've presented your work at a major SAS conference! Your professional network needs to know this about you. You should list this as an accomplishment on your resume, and definitely on your LinkedIn profile.

LinkedIn offers a "publication" section -- perfect for listing books and papers that you've written. Or, you can add this to the "projects" section of your profile, especially if you collaborate with someone else that you want to include in this accomplishment. I have yet to add my entire back-catalog of conference papers, but I have added a few recent papers to my LinkedIn profile.

One of a few publications listed on my LinkedIn profile

Bonus step: write about your experience in a LinkedIn article

Introspection has a special sort of currency on LinkedIn that doesn't always translate well to other places. A LinkedIn article -- a long-form post that you write from a first-person perspective -- gives you a chance to talk about the deeper meaning of your project. This can include the story of inspiration behind your conference paper, personal lessons that you learned along the way, and the impact that the project had in your workplace and on your career. This "color commentary" adds depth to how others see your work and experience, which helps them to learn more about you and what drives you.

Here are a few examples of what I'm talking about:

It's not about you. It's about us

The techniques I've shared here might sound like "how to promote yourself." Of course, that's important -- we each need to take responsibility for our own self-promotion and ensure that our professional achievements shine through. But more importantly, these steps play a big role helping your content to be findable -- even "stumble-uponable" (a word I've just invented). You've already invested a tremendous amount of work into researching your topic and crafting a paper and presentation -- take it the extra bit of distance to make sure that the rest of us can't miss it.

The post How to share your SAS knowledge with your professional network appeared first on The SAS Dummy.

How to list your mapped drives in a SAS program

$
0
0

If you work in a team environment, you might be accustomed to using mapped network drives for source data folders or to publish results. If you've recently moved to a SAS server environment, you might not have those mapped drives available. How can you tell?

This question was posted on the SAS Support Communities, and SAS Communities member Patrick provided a handy code snippet that does the trick. Patrick learned the trick himself from SAS Note 24818.

The following code uses a special DRIVEMAP device for the FILENAME statement. DRIVEMAP works only on Windows systems, which are the only systems that support the concept of mapped drives.

filename diskinfo DRIVEMAP;
data _null_;
  infile diskinfo;
  input;
  put _infile_;
run;

Here's the output on my PC, where I have a two mapped network drives, U: and W:. (C: and D: are part of my laptop device.)

NOTE: The infile DISKINFO is:
      Drivemap Access Device,
      PROCESS,RECFM=V,LRECL=32767
C:
D:
U:
W:

Mapped drives are typically available only from your local machine. Even if you use SAS on a remote Windows server, the process that defines the mapped drive aliases is usually not triggered when you connect. If this is the case for you, you'll have to adjust your process to use UNC path notation (example: "\\server\folder\project" instead of the familiar "P:\project" shortcut). (Additional note for SAS admins: on a Windows SAS server, you must mark the server machine as Trusted for Delegation to allow end users to access these network paths.)

If you've moved from a local PC version of SAS to a remote non-Windows server (these days, Linux is common), then I'm afraid that "mapped drives" are a lost concept. You'll need to change your code to use Unix-style paths. A SAS admin or Unix-admin might need to help you to find the correct locations.

The post How to list your mapped drives in a SAS program appeared first on The SAS Dummy.

How to secure your REST API credentials in SAS programs

$
0
0

I've used SAS with a bunch of different REST APIs: GitHub, Brightcove, Google Analytics, Lithium, LinkedIn, and more. For most of these I have to send user/password or "secret" application tokens to the web service so that it knows who I am and what data I can retrieve. I do not want to keep this secret information in my SAS program files -- that would be a bad idea. If my credentials were part of the program -- even if they were obfuscated and not stored in clear text -- then anyone who managed to get a copy of my program could run it. And they could gain access to my data, as if they were me.

I've written about this topic for SAS-related passwords. In this article, I'll share the approach that I use for API credentials and tokens.

REST APIs: Each service requires different types of secrets

My REST API services don't require just simple user ID and password combos. It depends on the API, but usually the information is in the form of one or more tokens that I've generated using the vendor's developer console, or perhaps that have been granted by an administrator.

For example, to access the Google Analytics API, I need three things: a client ID, a client secret token, and a valid "refresh" token. I can send these three items to the Google OAuth2 API, and in return I'll receive a live "access" token that I can use to request my data. I think of this like checking into a hotel. I show my ID and a credit card at the front desk, and in exchange I receive a room key. Just like my hotel room key, the access token doesn't last forever and cannot be reused on my next visit.

Other APIs are simpler and require just a single token that never expires. That's more like a house key -- it's mine to use forever, or until someone decides to change the locks.

Whether a static token or a token-for-token exchange, I don't want to leave these keys lying around for just anyone to find and use.

Hide your tokens in a file that only you can read

My simple approach is to store the token values in a text file within my home directory. Then, I change the permissions on the file such that only my account can read it. Whether I submit my program interactively (in SAS Enterprise Guide or SAS Studio) or as a scheduled batch job, it's running under my account. I'm showing the instructions here for UNIX/Linux, but Windows users can accomplish something similar with Windows permissions.

On Linux, I've used the chmod command to specify the octal value that says "only the owner can read/write." That's "chmod 600 filename". The "ls -l" command shows that this permissions mask has been applied.

chmod 600 ./.google_creds.csv
ls -l ./.google_creds.csv
> -rw------- 1 myid mygroup 184 Jan 15 12:41 ./.google_creds.csv

I stored my tokens in a standard CSV format because it's easy for SAS to read and it's easy for me to read if I ever need to change it.

Use INFILE to read the tokens dynamically

With this critical data now stored externally, and the file permissions in place, I can use SAS to read the credentials/tokens within my program and store the values in SAS macro variables. In the following SAS program, I assigned a macro variable to my user root folder. Since I might run this program on Linux or Windows, I used this trick to determine the proper path notation. I also used the &SYSUSERID macro variable to make my program more portable. If I want to supply this program to any colleagues (or to you!), the only thing that's needed is to create and store the token CSV files in the proper location.

/* My path is different for UNIX vs Windows */
%let authpath = %sysfunc(ifc(&SYSSCP. = WIN,
	 \\netshare\root\u\&sysuserid.,
	 /u/&sysuserid.));
 
/* This should be a file that only YOU or trusted group members can read */
/* Use "chmod 0600 filename" in UNIX environment */
/* "dotfile" notation is convention for on UNIX for "hidden" */
filename auth "&authpath./.google_creds.csv";
 
/* Read in the secret account keys from another file */
data _null_;
 infile auth firstobs=2 dsd delimiter=',' termstr=crlf;
 length client_id $ 100 client_secret $ 30 refresh_token $ 60;
 input client_id client_secret refresh_token;
 call symputx('client_id',client_id);
 call symputx('client_secret',client_secret);
 call symputx('refresh_token',refresh_token);
run;

When I run this code in my production job, I can see the result:

NOTE: The infile AUTH is:
      Filename=/u/myid/.google_creds.csv,
      Owner Name=myid,Group Name=mygroup,
      Access Permission=rw-------,
      Last Modified=Mon Jan 15 12:41:58 2018,
      File Size (bytes)=184

NOTE: 1 record was read from the infile AUTH.
      The minimum record length was 145.
      The maximum record length was 145.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.01 seconds

For this example, my next step is to call the Google API to get my access token. I'll use the macro variables that my program created with CALL SYMPUTX to build the proper API call.

/* Call Google API to exchange the refresh token for an active access token */
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
filename rtoken temp;
proc http
 method="POST"
 url="&oauth2.?client_id=&client_id.%str(&)client_secret=&client_secret.%str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token."
 out=rtoken;
run;

See the full explanation of this Google Analytics example in this article.

The post How to secure your REST API credentials in SAS programs appeared first on The SAS Dummy.

How to test PROC HTTP and the JSON library engine

$
0
0

Using SAS with REST APIs is fun and rewarding, but it's also complicated. When you're dealing with web services, credentials, data parsing and security, there are a lot of things that can go wrong. It's useful to have a simple program that verifies that the "basic plumbing" is working before you try to push a lot of complex coding through it.

I'm gratified that many of my readers are able to adapt my API examples and achieve similar results. When I receive a message from a frustrated user who can't get things to work, the cause is almost always one of the following:

The following SAS program is a simple plumbing test. It uses a free HTTP test service (httpbin.org) to verify your Internet connectivity from SAS and your ability to use SSL. The endpoint returns a JSON-formatted collection of timestamps in various formats, which the program parses using the JSON library engine. I have successfully run this program from my local SAS on Windows, from SAS University Edition (Dec 2017 release), and from SAS OnDemand for Academics (using SAS Studio).

If you can run this program successfully from your SAS session, then you're ready to attempt the more complex REST API calls. If you encounter any errors while running this simple test, then you will need to resolve these before moving on to the really useful APIs. (Like maybe checking on who is in space right now...)

/* PROC HTTP and JSON libname test          */
/* Requires SAS 9.4m4 or later to run       */
/* SAS University Edition Dec 2017 or later */
 
/* utility macro to echo the JSON out */
%macro echoFile(fn=);
  data _null_;
   infile &fn end=_eof;
   input;
   put _infile_;
  run;
%mend;
 
filename resp "%sysfunc(getoption(WORK))/now.json";
proc http
 url="https://now.httpbin.org/"
 method="GET"
 out=resp;
run;
 
/* Supported with SAS 9.4 Maint 5 */
/*
 %put HTTP Status code = &SYS_PROCHTTP_STATUS_CODE. : &SYS_PROCHTTP_STATUS_PHRASE.; 
*/
 
%echoFile(fn=resp);
 
/* Tell SAS to parse the JSON response */
libname time JSON fileref=resp;
 
title "JSON library structure";
proc datasets lib=time;
quit;
 
/* interpret the various datetime vals and convert to SAS */
data values (keep=from:);
 length from_epoch 8
        from_iso8601 8
        from_rfc2822 8
        from_rfc3339 8;
 
 /* Apply the DT format to a range of vars */
 format from: datetime20.;
 set time.now;
 
 /* epoch = # seconds since 01Jan1970        */
 /* SAS datetime is based on 01Jan1960, so   */
 /* add the offset here for a SAS value      */
 from_epoch = epoch + '01Jan1970:0:0:0'dt;
 
 /* straight conversion to ISO8061           */
 from_iso8601 = input(iso8601,is8601dt.);
 
 /* trim the leading "day of week"      */
 from_rfc2822 = input(substr(rfc2822,5),anydtdtm21.);
 
 /* allow SAS to figure it out */
 from_rfc3339 = input(rfc3339,anydtdtm.);
 
 run;
 
title "Raw values from the JSON response";
proc print data=time.now (drop=ord:);
run;
 
title "Formatted values as SAS datetime";
proc print data=values;
run;
 
libname time clear;
filename resp clear;

The post How to test PROC HTTP and the JSON library engine appeared first on The SAS Dummy.

Coding in Python with SAS University Edition

$
0
0

Good news learners! SAS University Edition has gone back to school and learned some new tricks.

With the December 2017 update, SAS University Edition now includes the SASPy package, available in its Jupyter Notebook interface. If you're keeping track, you know that SAS University Edition has long had support for Jupyter Notebook. With that, you can write and run SAS programs in a notebook-style environment. But until now, you could not use that Jupyter Notebook to run Python programs. With the latest update, you can -- and you can use the SASPy library to drive SAS features like a Python coder.

Oh, and there's another new trick that you'll find in this version: you can now use SAS (and Python) to access data from HTTPS websites -- that is, sites that use SSL encryption. Previous releases of SAS University Edition did not include the components that are needed to support these encrypted connections. That's going to make downloading web data much easier, not to mention using REST APIs. I'll show one HTTPS-enabled example in this post.

How to create a Python notebook in SAS University Edition

When you first access SAS University Edition in your web browser, you'll see a colorful "Welcome" window. From here, you can (A) start SAS Studio or (B) start Jupyter Notebook. For this article, I'll assume that you select choice (B). However, if you want to learn to use SAS and all of its capabilities, SAS Studio remains the best method for doing that in SAS University Edition.

When you start the notebook interface, you're brought into the Jupyter Home page. To get started with Python, select New->Python 3 from the menu on the right. You'll get a new empty Untitled notebook. I'm going to assume that you know how to work with the notebook interface and that you want to use those skills in a new way...with SAS. That is why you're reading this, right?

Move data from a pandas data frame to SAS

pandas is the standard for Python programmers who work with data. The pandas module is included in SAS University Edition -- you can use it to read and manipulate data frames (which you can think of like a table). Here's an example of retrieving a data file from GitHub and loading it into a data frame. (Read more about this particular file in this article. Note that GitHub uses HTTPS -- now possible to access in SAS University Edition!)

import saspy
import pandas as pd
 
df = pd.read_csv('https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv')
df.describe()

Here's the result. This is all straight Python stuff; we haven't started using any SAS yet.

Before we can use SAS features with this data, we need to move the data into a SAS data set. SASPy provides a dataframe2sasdata() method (shorter alias: df2sd) that can import your Python pandas data frame into a SAS library and data set. The method returns a SASdata object. This example copies the data into WORK.PROBLY in the SAS session:

sas = saspy.SASsession()
probly = sas.df2sd(df,'PROBLY')
probly.describe()

The SASdata object also includes a describe() method that yields a result that's similar to what you get from pandas:

Drive SAS procedures with Python

SASPy includes a collection of built-in objects and methods that provide APIs to the most commonly used SAS procedures. The APIs present a simple "Python-ic" style approach to the work you're trying to accomplish. For example, to create a SAS-based histogram for a variable in a data set, simply use the hist() method.

SASPy offers dozens of simple API methods that represent statistics, machine learning, time series, and more. You can find them documented on the GitHub project page. Note that since SAS University Edition does not include all SAS products, some of these API methods might not work for you. For example, the SASml.forest() method (representing the HPFOREST procedure) works only when you have SAS Enterprise Miner. (And no, that's not included in SAS University Edition.)

"Teach me SAS": SAS code from Python code

In SASPy, all methods generate SAS program code behind the scenes. If you like the results you see and want to learn the SAS code that was used, you can flip on the "teach me SAS" mode in SASPy.

sas.teach_me_sas('true')

Here's what SASPy reveals about the describe() and hist() methods we've already seen:

Interesting code, right? Does it make you want to learn more about the STACKEDODSOUTPUT option on PROC MEANS? Or the SCALE= option on PROC SGPLOT?

If you want to experiment with SAS statements that you've learned, you don't need to leave the current notebook and start over. There's also a built-in %%SAS "magic command" that you can use to try out a few of these SAS statements.

%%SAS
proc means data=sashelp.cars stackodsoutput n nmiss median mean std min p25 p50 p75 max;run;

Python limitations in SAS University Edition

SAS University Edition includes over 300 Python modules to support your work in Jupyter Notebook. To see a complete list, run the help('modules') command from within a Python notebook. This list includes the common Python packages required to work with data, such as pandas and NumPy. However, it does not include any of the popular Python-based machine learning modules, nor any modules to support data visualization. Of course, SASPy has support for most of this within its APIs, so why would you need anything else...right?

Because SAS University Edition is packaged in a virtual machine that you cannot alter, you don't have the option of installing additional Python modules. You also don't have access to the Jupyter terminal, which would allow you to control the system from a shell-like interface. All of this is possible (and encouraged) when you have your own SAS installation with your own instance of SASPy. It's all waiting for you when you've outgrown the learning environment of SAS University Edition and you're ready to apply your SAS skills and tech to your official work!

Learn more

The post Coding in Python with SAS University Edition appeared first on The SAS Dummy.


SAS functions to encode and decode data for the Web

$
0
0

Sir Tim Berners-Lee is famous for inventing the World Wide Web and for the construction of URLs -- a piece of syntax that every 8-year-old is now familiar with. According to the lore, when Sir Tim invented URLs he did not imagine that Internet surfers of all ages and backgrounds would be expected to type these cryptic schemes into their browser windows...but here we are. Today's young people can navigate the Web with URLs in the same way that migratory birds can find their way South for the winter -- by pure instinct.

URLs are syntax, the language of Web navigation. And HTML is syntax too, the language of the Web page. Each of these represent instructions to our web browser, telling it to how to go somewhere or display something. As we navigate the web programmatically, it is sometimes necessary to encode information in a URL or in HTML in a way that it won't be mistakenly interpreted as syntax. And of course, SAS has some functions for that.

Here's a table with links to the SAS documentation for these functions. You'll find they are intuitively named. The names are the same or similar to corresponding functions in other programming languages -- you can get only so creative with basic functions like these.

HTMLDECODE Function   Decodes a string that contains HTML numeric character references or HTML character entity references, and returns the decoded string.
HTMLENCODE Function   Encodes characters using HTML character entity references, and returns the encoded string.
URLDECODE Function   Returns a string that was decoded using the URL escape syntax.
URLENCODE Function   Returns a string that was encoded using the URL escape syntax.

Of these four functions, I use URLENCODE the most often. I need it when I need to pass syntax for instructions to a REST API, in which the API call itself is a URL. Here's an example from my paper about accessing Google Analytics with SAS:

%let workdate='01Oct2017'd;
%let urldate=%sysfunc(putn(&workdate.,yymmdd10.));
%let metrics=%sysfunc(urlencode(%str(ga:pageviews,ga:sessions)));
%let id=%sysfunc(urlencode(%str(ga:XXXXXX)));
filename garesp temp;
proc http
  url="https://www.googleapis.com/analytics/v3/data/ga?ids=&id.%str(&)start-date=&urldate.%str(&)end-date=&urldate.%str(&)metrics=&metrics.%str(&)max-results=20000"
  method="GET" out=garesp;
  headers 
    "Authorization"="Bearer &access_token."
    "client-id:"="&client_id.";
 run;

In the previous example, I need to include "ga:pageviews,ga:sessions" as an instruction on the Google Analytics API, but the colon character has special meaning within the URL syntax. I need to "escape" the colon character so that the URL parser ignores it and simply passes the value to the API. The URLENCODE function converts this segment to "ga%3Apageviews,ga%3Asessions". The colon has been replaced by its hexadecimal code, set off by a percent sign: %3A.

An aside: the percent (%) and ampersand (&) characters have special meaning in URLs. They also have special meaning in the SAS macro language. We can claim that the SAS macro language preceded URL syntax by decades, but there are only so many characters on the keyboard that syntax designers can use to set off code instructions. I use the %str() macro function to tell SAS to process certain URL segments as literals and don't allow the macro parser to have a turn at them. This helps my program to avoid warnings like: "WARNING: Apparent symbolic reference END not resolved."

HTMLENCODE is useful when you need to represent HTML syntax in your output, but prevent the web browser from interpreting the HTML as code. Here's a simple example.

filename out temp;
ods html5 file=out;
data link;
  site = "sas.com";
  link = "<a href='https://www.sas.com'>sas.com</a>";
  code_as = htmlencode(link);
run;
 
proc print data=link;
run;
ods html5 close;

When produced using the HTML5 destination, the link variable is formatted as a live link, while the code_as variable shows the syntax that went into it.

As you might expect, the URLDECODE function "unescapes" the URL hex characters and restores the original URL syntax. HTMLDECODE does the same for HTML content. If you are writing code that implements an API endpoint in SAS (as you might do with a SAS stored process on the back end of a web service), you'll find these functions useful to unpack the information that was encoded on an API call.

HTMLENCODE and URLENCODE are not interchangeable. More than once, I have written programs that mistakenly use HTMLENCODE when it was URLENCODE that was needed. Those mistakes can be tricky to debug, so pay attention!

Cover image by Fabio Lanari, Internet2, CC BY-SA 4.0

The post SAS functions to encode and decode data for the Web appeared first on The SAS Dummy.

A productive future for SAS Enterprise Guide

$
0
0

The title of this blog says what you really need to know: SAS Enterprise Guide does have a future, and it's a bright one. Ever since SAS Studio debuted in 2014, onlookers have speculated about its impact on the development of SAS Enterprise Guide.

I think that we have been consistent with our message that SAS Enterprise Guide serves an important purpose -- a power-user interface for SAS on the desktop -- and that the product will continue to get support and new features. But that doesn't stop folks from wondering whether it might meet a sudden demise like a favorite Star Wars or Game of Thrones character.

I recently recorded a session with Amy Peters, the SAS product manager for SAS Enterprise Guide and SAS Studio. Amy loves to meet with SAS users and hear their successes, their concerns, and their ideas. Her enthusiasm for SAS Enterprise Guide comes through in this video, even as I bumble my way through the prototype of the Next Big Release.

Coming soon: the features of a modern IDE

In addition to a much-needed makeover and modern appearance, the new version of SAS Enterprise Guide (scheduled for sometime in 2019) addresses many of the key requests that we hear from SAS users. First, the new version blows open the window management capabilities. You can open and view many items -- programs, data, log, results -- at the same time, and arrange those views exactly as you want. You can spread your workspace over multiple displays. And you can tear away or dock each item to suit your working style.

Screenshot of Future EG

(in development) screenshot of SAS Enterprise Guide

Second, you can decide whether you want to work with a SAS Enterprise Guide project -- or just simply write and run code. Currently you must start with a project before you can create or open anything else. The new version allows you leverage a project to organize your work...or not, depending on your need at the moment.

And finally, you can expect more alignment and collaboration features between SAS Studio and SAS Enterprise Guide. We see that more users find themselves using both interfaces for related tasks, and presenting a common experience is important. SAS Studio runs in your browser while SAS Enterprise Guide works on your desktop. Each application has different capabilities related to that, but there's no reason that they need to be so different, right?

For more information about what the future will bring, check out the communities article that recaps the SAS Global Forum 2018 presentation. It includes an attached presentation slide deck with many exciting screenshots and roadmap details. All of this is subject to change, of course (including release dates!), but I think it's safe to say the future is bright for SAS users who love their tools.

The post A productive future for SAS Enterprise Guide appeared first on The SAS Dummy.

Which random number generator did Thanos use?

$
0
0

WARNING: This blog post references Avengers: Infinity War and contains story spoilers. But it also contains useful information about random number generators (RNGs) -- tempting! If you haven't yet seen the movie, you should make peace with this inner conflict before reading on.

Throughout the movie, Thanos makes it clear that his goal is to eliminate half of the population of every civilization in the universe. With the power of all six infinity stones imbued into his gauntlet, he'll be able to accomplish this with a "snap of his fingers." By the end of the film, Thanos has all of the stones, and then he literally snaps his fingers. (Really? I kept thinking that this was just a figure of speech he used to indicate how simple this will be -- but I guess it works more like the ruby slippers in The Wizard of Oz. Some clicking was required.)

So, Thanos snaps his huge fingers and -- POOF -- there goes half of us. Apparently the universe already had some sort of population-reduction subroutine just waiting for a hacker like Thanos to access it. Who put that there? Not a good plan, universe designer. (Check here to see if you survived the snap.)

But how did Thanos (or the universe) determine which of us was wiped from existence and which of us was spared? I have to assume that it was a seriously high-performing, massively parallel random number generator. And if Thanos had access to 9.4 Maintenance 5 or later (part of the Power [to Know] stone?), then he would have his choice of algorithms.

(Tony Stark has been to SAS headquarters, but we haven't seen Thanos around here. Still, he's welcome to download SAS University Edition.)

Your own RNG gauntlet, built into SAS

I know a little bit about this topic because I talked with Rick Wicklin about RNGs. As Rick discusses in his blog post, a recent release of SAS added support for several new/updated RNG algorithms, including Mersenne twister, PCG, Threefry, and one that introduces hardware-based entropy for "extra randomness." If you want to save yourself some reading, watch our 10-minute discussion here.

Implementing my own random Avengers terminator

I was going to write a SAS program to simulate Thanos' "snap," but I don't have a list of every single person in the universe (thanks GDPR!). However, courtesy of IMDB.com, I do have a list of the approximately 100 credited characters in the Infinity War movie. I wrote a DATA step to pull each name into a data set and "randomly decide" each fate by using the new PCG algorithm and the RAND function with a Bernoulli (binomial) distribution. I learned that trick from Rick's post about simulating coin flips. (I hope I did this correctly; Rick will tell me if I didn't.)

%let algorithm = PCG;
data characters;
  call streaminit("&algorithm.",2018);
  infile datalines dsd;
  retain x 0 y 1;
  length Name $ 60 spared 8 x 8 y 8;
  input Name;
  Spared = rand("Bernoulli", 0.5);
  x+1;
  if x > 10 then
    do; y+1; x = 1;end;
datalines;
Tony Stark / Iron Man
Thor
Bruce Banner / Hulk
Steve Rogers / Captain America
/* about 96 more */
;
run;

After all of the outcomes were generated, I used PROC FREQ to check the distribution. In this run, only 48% were spared, not an even 50%. Well, that's randomness for you. On the universal scale, I'm not sure that anyone is keeping track.

How many spared

Using a trick I learned from Sample 54315: Customize your symbols with the SYMBOLCHAR statement in PROC SGPLOT, I created a scatter plot of the outcomes. I included special Unicode characters to signify the result in terms that even Hulk can understand. Hearts represent survivors; frowny faces represent the vanished heroes. Here's the code:

data thanosmap;
  input id $ value $ markercolor $ markersymbol $;
  datalines;
status 0 black frowny
status 1 red heart
;
run;
 
title;
ods graphics / height=400 width=400 imagemap=on;
proc sgplot data=Characters noautolegend dattrmap=thanosmap;
  styleattrs wallcolor=white;
  scatter x=x y=y / markerattrs=(size=40) 
    group=spared tip=(Name Spared) attrid=status;
  symbolchar name=heart char='2665'x;
  symbolchar name=frowny char='2639'x;
  xaxis integer display=(novalues) label="Did Thanos Kill You? Frowny=Dead" 
    labelattrs=(family="Comic Sans MS" size=14pt);
    /* Comic Sans -- get it ???? */
  yaxis integer display=none;
run;

Scatter plot of spared

For those of you who can read, you might appreciate a table with the rundown. For this one, I used a trick that I saw on SAS Support Communities to add strike-through text to a report. It's a simple COMPUTE column with a style directive, in a PROC REPORT step.

proc report data=Characters nowd;
  column Name spared;
  define spared / 'Spared' display;
  compute Spared;
    if spared=1 then
      call define(_row_,"style",
        "style={color=green}");
    if spared=0 then
      call define(_row_,"style",
        "style={color=red textdecoration=line_through}");
  endcomp;
run;

Table of results

Remember, my results here were generated with SAS and don't match the results from the film. (I feel like I need to say that to preempt a few comments.) The complete code for this blog post is available on my public Gist.

Learn more about RNGs

Just as the end of Avengers: Infinity War has sent throngs of viewers to the Internet to find out What's Next, I expect that readers of this blog are eager to learn more about these modern random number generators. Here are the go-to articles from Rick that are worth your review:

Unanswered questions

Before Thanos completed his gauntlet, his main hobby was traveling around the cosmos reducing the population of each civilization "the hard way." With the gauntlet in hand when he snapped his fingers, did he eliminate one-half of the remaining population? Or did the universe's algorithm spare those civilizations that had already been culled? Was this a random sample with replacement or not? In the film, Thanos did not express concern about these details (typical upper management attitude), but the grunt-workers of the universe need to know the parameters for this project. Coders need exact specifications, or else you can expect less-than-heroic results from your infinity gauntlet. I'm pretty sure it says so in the owner's manual.

The post Which random number generator did Thanos use? appeared first on The SAS Dummy.

How to use SAS to read a range of cells from Excel

$
0
0

I've said it before: spreadsheets are not databases. However, many of us use spreadsheets as if they were databases, and then we struggle when the spreadsheet layout does not support database-style rigor of predictable rows, columns, and variable types -- the basic elements we need for analytics and reporting. If you're using SAS to read data from Microsoft Excel, what can you do when the data you need doesn't begin at cell A1?

By design, SAS can read data from any range of cells in your spreadsheet. In this article, I'll describe how to use the RANGE statement in PROC IMPORT to get the data you need.

With SAS 9.4 and later, SAS recommends using DBMS=XLSX for the most flexibility. It works on all operating systems without the need for additional components like the PC Files Server. Your Excel file must be in the Excel 2007-or-later format (XLSX). You do need a licence for SAS/ACCESS to PC Files. (Just learning? These DBMS=XLSX techniques also work in SAS University Edition.)

If your Excel data does not begin in cell A1 (the default start point for an import process), then you can add a RANGE= value that includes the specific cells. The easiest method is to use a Named Range in Excel to define the exact boundaries of the data.

How to add a Named Range

To define a named range in Excel, highlight the range of cells to include and simply type the new name of the range in the Name Box:
Excel named range

Then save the Excel file.

Then to import into SAS, specify that range name in the RANGE= option:

proc import datafile="/myprojects/myfile.xlsx"
 out=mydata 
 replace;
range="myspecialrange";
run;

Using Excel notation for a cell range

What if you don't know the range ahead of time? You can use PROC IMPORT to read the entire sheet, but the result will not have the column headers and types you want. Consider a sheet like this:

Excel with floating data

This code will read it:

proc import datafile="/myprojects/middle.xlsx"
 out=mid dbms=xlsx
 replace;
run;

But the result will contain many empty cells, and the values will be read as all character types:

Excel naive import

With additional coding, you can "fix" this result in another pass using DATA step. Or, if you're willing to add the RANGE option with the Excel notation for the specific cell ranges, you can read it properly in the first pass:

proc import datafile="/myprojects/middle.xlsx"
 out=mid dbms=xlsx
 replace;
 range="Sheet1$E7:K17" ;
run;

How to "discover" the structure of your Excel file

You can also use LIBNAME XLSX to read entire sheets from Excel, or simply as a discovery step to see what sheets the Excel file contains before you run PROC IMPORT. However, LIBNAME XLSX does not show the Excel named ranges.

On SAS for Windows systems, you can use LIBNAME EXCEL (32-bit) or LIBNAME PCFILES (64-bit) to reveal a little more information about the Excel file.

libname d pcfiles path="c:\myprojects\middle.xlsx";
proc datasets lib=d; quit;
 
/* always clear the libname, as it locks the file */
libname d clear;

Libname XLSX proc datasets

Note that DBMS=XLSX does not support some of the options we see in the legacy DBMS=XLS (which supports only old-format XLS files), such as STARTROW and NAMEROW. DBMS=XLSX does support GETNAMES (treats the first record of the sheet or range as the variable names). See the full reference for Excel file import/export in the SAS documentation.

The post How to use SAS to read a range of cells from Excel appeared first on The SAS Dummy.

The Internet of Snacks: SnackBot data and what it reveals about SAS life

$
0
0

At SAS, we love data. Data is central to our corporate vision: to transform a world of data into a world of intelligence. We're also famous for enjoying M&Ms, but to us they are more than a sweet snack. They're also another source of data.

My colleague Pete Privitera, with a team of like-minded "makers," built a device that they named SnackBot. SnackBot is an internet-connected sensor that measures the flow of M&Ms in a particular SAS break room. There's a lot to love about this project. You can learn more by watching its origin story in this video:

As the number of M&Ms changes, SnackBot takes a reading and records the M&M count in a database. Most readings reflect a decrease in candy pieces, as my colleagues help themselves to a treat. But once per week, the reading shows a drastic increase -- as our facilities staff restocks the canister. SnackBot has its own website. It also has its own API, and you know what that means (right?). It means that we can use SAS to read and analyze the sensor data.

Reading sensor data into SAS with the SnackBot API

The SnackBot system offers a REST API with a JSON data response. Like any REST API, we can use PROC HTTP to fetch the data, and the JSON library engine to parse the response into a SAS data set.

%let start = '20MAY2018:0:0:0'dt;
 
/* format the start/end per the API needs */
%let start_time= %sysfunc(putn(&start.,is8601dt26.));
%let end_time=   %sysfunc(datetime(),is8601dt26.);
 
/* Call the SnackBot API from snackbot.net */
filename resp temp;
proc http
  method="GET"
  url="http://snackbot.net/snackdata?start_time=&start_time.%str(&)end_time=&end_time.%str(&)utc_offset_minutes=-240"
  out=resp;
run;
 
/* JSON libname engine to read the result       */
/* Simple record layout, all in the ROOT member */
libname mms json fileref=resp;
data mmlevels;
  set mms.root;
run;

I've written about how to use SAS with REST APIs in several other blog posts, so I won't dwell on this part of the process here. This short program retrieves the raw data from SnackBot, which represents a series of M&M "levels" (count of remaining pieces) and a timestamp for each measurement. It's a good start. Though there are only two fields to work with here, there's quite a bit we can do with these data.

raw SnackBot data

Add features to the raw sensor data

With a few additional DATA step statements and some built-in SAS formats, we can derive several interesting characteristics of these data for use in further analysis.

First, we need to convert the character-formatted datetime field to a proper SAS datetime value. That's easily achieved with the INPUT function and the ANYDTDTM informat. (Rick Wicklin wrote a helpful article about how how the ANYDT* informats work.)

data mmlevels;
  set mms.root;
  drop ordinal_root timestamp;
  /* Convert the TIMESTAMP field to native value -- it's a character */
  datetime = input(timestamp, anydtdtm.);
  date = datepart(datetime);
  time = timepart(datetime);
  dow = date;
  qhour = round(datetime,'0:15:0'T);
  format  datetime datetime20. 
          qhour datetime20.
          date date9.
          time timeampm10.
          dow downame.;
run;

For convenience, I duplicated the datetime value a few times and applied different formats so we can get different views of the same value: datetime, just the date, just the time-of-day, and the day-of-week. I also used the ROUND function to "round" the raw datetime value to the nearest quarter hour. I'll explain why I've done that in a later step, but the ROUNDing of a time value is one of the documented unusual uses of the ROUND function.

SnackBot data with features

Even with this small amount of data preparation, we can begin to analyze the characteristics of these data. For example, let's look at the descriptive stats for the data classified by day-of-week:

title "SnackBot readings per day-of-week";
proc means data=mmlevels mean stddev max min;
 var pieces;
 class dow;
run;

SnackBot by day of week

The "N Obs" column shows the number of measurements taken over the entire "study period" broken down by day-of-week. If a measurement is a proxy for a "number-of-pieces-changed" event, then we can see that most events happen on Wednesday, Thursday, and Friday. From this, can you guess which day the M&M canister is refilled?

Let's take another slice through these data, but this time looking at time-of-day. For this, I used PROC FREQ to count the measurements by hour. I applied the HOUR2. format, which allows the SAS procedure to group these data into hour-long intervals with no need for additional data prep. ( I've written previously about how to use SAS formats to derive new categories without expensive data rewriting.) Then I used PROC SGPLOT to produce a step plot for the 24-hour cycle.

/* Count of readings per hour of the day */ 
title "SnackBot readings per hour";
proc freq data=mmlevels ;
 table time / out=perhour;
 format time hour2.;
run;
 
ods graphics / height=400 width=800;
 
title "SnackBot readings per hour";
proc sgplot data=perhour des="Readings per hour of day";
 step x=time y=count;
 xaxis min='0:0:0't max='24:0:0't label="Time of day" grid;
 yaxis label="Servings";
run;

SnackBot hour step

From the chart, we can see that most M&M "events" happen around 11am, and then again between 2pm and 4pm. From personal experience, I can confirm that those are the times when I hear the M&Ms calling to me.

Expand the time series to regular intervals

The SnackBot website can tell you how many M&Ms are remaining right now. But what if you want to know how many were remaining last Friday? Or on any typical Monday morning?

The sensor data that we've analyzed so far is sparse -- that is, there are data entries for each "change" event, but not for every discrete time interval in the study period. I don't know how the SnackBot sensor records its readings -- it might sample the M&M levels every minute, or every second. Regardless, the API reports (and probably stores) only the records that represent a change. If SnackBot records that the final 24 pieces were depleted at 25JUN2018:07:45:00 (a Monday morning) bringing the count to 0, how many M&Ms remain at 1pm later that day? The data don't tell us explicitly with a recorded reading. But we can assume at that point that the count was still 0. The next recorded reading occurs at 27JUN2018:10:30:00 (on a Wednesday, bringing the count to 1332 -- oh joy!).

If we want to create a useful time series visualization of the M&M candy counts over time, we need to expand the time series from these sparse recordings to regular intervals. SAS offers a few sophisticated time series procedures to accomplish this: PROC EXPAND, PROC TIMESERIES, and PROC TIMEDATA. Each of these offer powerful econometrics methods for interpolation and forecasting -- and that's more than we need for this situation. For my example, I took a more low-tech approach.

First, I created an empty data set with datetime entries at quarter-hour intervals, covering the study period of the data we're looking at.

/* Empty data set with 15 minute interval slots    */
/* Regular intervals for the entire "study" period */
data timeslots;
  last = datetime();
  length qhour 8;
  format qhour datetime20;
  drop last i;
  do i = &start. to last by '0:15:00't;
    qhour = i;
    output;
  end;
run;

Then I used a DATA step to merge these empty slots with the actual event data that I had rounded to the nearest quarter hour (remember that?):

/* Merge the sample data with the timeslots */
data expand;
  merge mmlevels(keep=pieces qhour) timeslots;
  by qhour;
run;

Finally, I used a variation of a last-observation-carried-forward (LOCF) approach to fill in the remaining empty slots. If a reading at 20MAY2018:11:15:00 reports 132 pieces remaining, then that value should be RETAINed for each 15-minute slot until the next reading at 20MAY2018:17:30:00. (That reading is 82 pieces -- meaning somebody helped themselves to 50 pieces. Recommended serving size for plain M&Ms is 20 pieces, but I'm not passing judgement.) I also recorded a text value for the day-of-week to help with the final visualization.

/* for empty timeslots, carry the sample data   */
/* forward, so we always have a count of pieces */
/* Variation on a LOCF technique                */
data final;
  set expand;
  length day $ 3;
  /* 3-char value for day of week */
  day=put(datepart(qhour),weekdate3.);
  retain hold;
  if not missing(pieces) then
    hold=pieces;
  else pieces=hold;
  drop hold;
  if not missing(pieces);
run;

Now I have data that represents the regular intervals that we need.

SnackBot regular intervals

Putting it all together

For my final visualization, I created a series plot for the study period. It shows the rise and fall of M&Ms levels in one SAS break room over several weeks. For additional "color", I annotated the plot with a block chart to delineate the days of the week.

title 'Plain M&M pieces on S1 tracked by SnackBot';
ods graphics / height=300 width=1600;
 
proc sgplot data=final des='M&M pieces tracked by SnackBot';
 
  /* plot the data as a series */ 
  series x=qhour y=pieces / lineattrs=(color=navy thickness=3px);
 
  /* Yes, these are the "official" M&M colors               */
  /* Will be applied in data-order, so works best when data */
  /* begins on a Sunday                                     */
  styleattrs datacolors=(red orange yellow green blue CX593B18 red);
  /* block areas to indicate days-of-week                   */
  block x=qhour block=day / transparency=0.65
    valueattrs=(weight=bold size=10pt color=navy);
 
  xaxis minor display=(nolabel);
  yaxis display=(nolabel) grid max=1600 minor;
run;

You can see the pattern. M&Ms are typically filled on Wednesday to the canister capacity of about 1400 pieces. We usually enter into the weekend with 0 remaining, but there are exceptions. The week of May 27 was our Memorial Day holiday, which explains the lack of activity on Monday (and even Tuesday) during that week as SAS folks took advantage of a slow week with their vacation plans.

SnackBot visualization

More about SAS and M&Ms data

You can download the complete code for this example from my public Gist on GitHub. The example code should work with SAS University Edition and SAS OnDemand for Academics, as well as with any SAS environment that can reach the internet with PROC HTTP.

For more M&M data fun, check out Rick Wicklin's article about the distribution of colors in plain M&Ms. SnackBot does not (yet) report on how many and which color of M&Ms are taken per serving, but using statistics, we can predict that!

Finally, I'd like to (humbly!) point out that my 2009 blog post about a snack-driven dashboard was remarkably prescient. True, I was riffing off of the idea of the internet-connected Coke machine from the 1980s, but thanks to SnackBot for making my dream come to life.

The post The Internet of Snacks: SnackBot data and what it reveals about SAS life appeared first on The SAS Dummy.

Using %IF-%THEN-%ELSE in SAS programs

$
0
0

SAS programmers have long wanted the ability to control the flow of their SAS programs without having to resort to complex SAS macro programming. With SAS 9.4 Maintenance 5, it's now supported! You can now use %IF-%THEN-%ELSE constructs in open code. This is big news -- even if it only recently came to light on SAS Support Communities. (Thanks to Super User Tom for asking about it.)

Prior to this change, if you wanted to check a condition -- say, whether a data set exists -- before running a PROC, you had to code it within a macro routine. It would look something like this:

/* capture conditional logic in macro */
%macro SummarizeIfExists();
 %if %sysfunc(exist(work.result)) %then
  %do;
    proc means data=work.result;
    run;
  %end; 
%else
  %do;
    %PUT WARNING: Missing WORK.RESULT - report process skipped.;
  %end;
%mend;
 
/* call the macro */
%SummarizeIfExists();

Now you can simplify this code to remove the %MACRO/%MEND wrapper and the macro call:

/* If a file exists, take an action */
/* else fail gracefully */
%if %sysfunc(exist(work.result)) %then
  %do;
    proc means data=work.result;
    run;
  %end;
%else
  %do;
    %PUT WARNING: Missing WORK.RESULT - report process skipped.;
  %end;

Here are some additional ideas for how to use this feature. I'm sure you'll be able to think of many more!

Run "debug-level" code only when in debug mode

When developing your code, it's now easier to leave debugging statements in and turn them on with a simple flag.

/* Conditionally produce debugging information */
%let _DEBUG = 0; /* set to 1 for debugging */
%if &_DEBUG. %then
  %do;
    proc print data=sashelp.class(obs=10);
    run;
  %end;

If you have code that's under construction and should never be run while you work on other parts of your program, you can now "IF 0" out the entire block. As a longtime C and C++ programmer, this reminds me of the "#if 0 / #endif" preprocessor directives as an alternative for commenting out blocks of code. Glad to see this in SAS!

/* skip processing of blocks of code */
/* like #if 0 / #endif in C/C++      */
%if 0 %then
  %do;
    proc ToBeDetermined;
      READMYMIND = Yes;
    run;
  %end;

Run code only on a certain day of the week

I have batch jobs that run daily, but that send e-mail to people only one day per week. Now this is easier to express inline with conditional logic.

/*If it's Monday, send a weekly report by email */
%if %sysfunc(today(),weekday1.)=2 %then
  %do;
    options emailsys=smtp emailhost=myhost.company.com;
    filename output email
      subject = "Weekly report for &SYSDATE."
      from = "SAS Dummy <sasdummy@sas.com>"
      to = "knowledgethirster@curious.net"
      ct ='text/html';
 
  ods tagsets.msoffice2k(id=email) 
    file=OUTPUT(title="Important Report!")
    style=seaside;
   title "The Weekly Buzz";
   proc print data=amazing.data;
   run;
  ods tagsets.msoffice2k(id=email) close;
  %end;

Check a system environment variable before running code

For batch jobs especially, system environment variables can be a rich source of information about the conditions under which your code is running. You can glean user ID information, path settings, network settings, and so much more. If your SAS program needs to pick up cues from the running environment, this is a useful method to accomplish that.

/* Check for system environment vars before running code */
%if %sysfunc(sysexist(ORACLE_HOME)) %then
  %do;
    %put NOTE: ORACLE client is installed.;
    /* assign an Oracle library */
    libname ora oracle path=corp schema=alldata authdomain=oracle;
  %end;

Limitations of %IF/%THEN in open code

As awesome as this feature is, there are a few rules that apply to the use of the construct in open code. These are different from what's allowed within a %MACRO wrapper.

First rule: your %IF/%THEN must be followed by a %DO/%END block for the statements that you want to conditionally execute. The same is true for any statements that follow the optional %ELSE branch of the condition.

And second: no nesting of multiple %IF/%THEN constructs in open code. If you need that flexibility, you can do that within a %MACRO wrapper instead.

And remember, this works only in SAS 9.4 Maintenance 5 and later. That includes the most recent release of SAS University Edition, so if you don't have the latest SAS release in your workplace, this gives you a way to kick the tires on this feature if you can't wait to try it.

The post Using %IF-%THEN-%ELSE in SAS programs appeared first on The SAS Dummy.

SAS code golf: find the max digit in a string of digits

$
0
0

"Code golf" is a fun programming pastime that challenges you to solve a problem with the least amount of code possible. Like regular golf, the goal is to use fewest code "strokes" to hit the mark. Here's a recent challenge that was posted to me via Twitter.

While I feel that I can solve nearly any problem (that I can understand) using SAS, my knowledge of the SAS language is quite limited when compared to that of many experts.  And so, I reached out to the SAS Support Communities for help on this one.

The answers were quick, creative, and diverse.  I'll share a few of them here.

The winner, in terms of concision, came from FreelanceReinhard.  He supplied a macro-function one-liner:

%sysfunc(findc(123456789,000112010302,b));

With this entry, FreelanceReinhard defied a natural algorithmic instinct to treat this as a numerical digit comparison problem, and instead approached it as simple pattern matching problem.  The highest digit comes from a finite set (0..9).  The FINDC function can tell you which of those digits is the first to be found in the target string.  The b directive tells FINDC to work backwards through the pattern, from '9' down to '0'.

In a similar vein, novinosrin's approach uses the COMPRESS function to keep only the highest digits from the pattern, in descending order, and then applies the FIRST function to return the top value.

a=first(compress('9876543210','000112010302','k'));

The COMPRESS function is often used to eliminate matching characters from a string, but the k directive inverts the action to keep only the matching characters instead.

If you wanted to use the more traditional approach of looping through values, comparing, and keeping just the maximum value, then you can hardly do better than the code offered by hashman.

do j = 1 to length (str) ; 
  d = d <> input (char (str, j), 1.) ;
end ;

Experienced SAS programmers will remember that the <> operator is shorthand for MAX (as opposed to "not equal" as some of us learned in Pascal or SQL).  "MAX" might be clearer to read, but it requires an additional character.  (Remember the "><" is shorthand for the MIN operator in SAS.)

AhmedAl_Attar offered the most dangerous approach, using memory manipulation techniques to populate members of an array:

array ct [20] $1 _temporary_;
call pokelong (str,addrlong(ct[1]),length(str));
c=max(of ct{*});

CALL POKELONG and ADDRLONG are documented along with several cautions due to the risk of overwriting something important in your process or system memory.  But, they are fast-acting.

And finally, I knew that there would be an elegant matrix-based approach in SAS/IML.  ChanceTGardener offered the first variant, and then Rick Wicklin echoed it shortly after.

proc iml;
  str='000112010302';
  maximum=max((substr(str,1:length(str),1)));
  print maximum;
quit;

Code golf does not always produce the most readable, maintainable code. But puzzles like these encourage us to explore new features and nuanced behaviors of our favorite programming language, and thus broaden our understanding of how SAS really works.

Appendix: Code for featured solutions

Want to experiment with these different approaches?  Here's a SAS program that combines all of them. Think you can do better (or different)? Visit the communities topic and chime in.

data max;
  str = '000112010302';
 
  /* novinosrin's approach */
  a=first(compress('9876543210',str,'k'));
 
  /* FreelanceReinhard's approach */
  b=findc('123456789',str,-9);
 
  /* AhmedAl_Attar's approach using POKELONG */
  array ct [20] $1 _temporary_;
  call pokelong (str,addrlong(ct[1]),length(str));
  c=max(of ct{*});
 
  /* loop approach from hashman */
  /* remember that <> is MAX    */
  do j = 1 to length (str) ;            
    d = d <> input (char (str, j), 1.) ;
  end ;
 
drop j;
run;
 
/* FreelanceReinhard's approach in a one-liner macro function */
%let str=000112010302;
%put max=%sysfunc(findc(123456789,&str.,b));
 
/* IML approach from ChanceTGardener */
/* Requires SAS/IML to run           */
proc iml;
  str='000112010302';
  maximum=max((substr(str,1:length(str),1)));
  print maximum;
quit;

The post SAS code golf: find the max digit in a string of digits appeared first on The SAS Dummy.


Manage the current directory within your SAS program

$
0
0

The concept of "current working directory" is important within any SAS program that reads or creates external files. In SAS, when you reference a file location with a relative path (for example, "./projects/mydata.pdf"), that file reference resolves to an absolute path by way of the working directory. You can control the initial working directory by modifying the shell scripts that launch the SAS process, or by specifying the SASINITIALFOLDER option in the SAS command or config file. But if you're using SAS Enterprise Guide or SAS Studio, it's likely that you don't have the ability to control how the SAS process starts. But don't worry -- there is still a way to find and change the working directory within your SAS programs.

Find the current directory in SAS

Tom (one of our SAS Communities Super Users) shared a simple SAS macro that allows you to learn the current working directory. The macro uses a trick to assign a SAS fileref to the current path ('.'), grab the full path of that fileref by using the PATHNAME function, and then clear the fileref.

Read the article for the full source (it's only about 7 lines). Here's how you would use it:

56         %put Current path is %curdir;
Current path is C:\WINDOWS\system32

As you might infer from my example here, I'm running this on a managed Windows environment. Most users cannot write to the "C:\WINDOWS\system32" path (and would not want to), so any relative file paths in my SAS code would cause errors. Maybe you've seen something like this:

25         ods html file="./test.html";
NOTE: Writing HTML Body file: ./test.html
ERROR: Insufficient authorization to access C:\WINDOWS\system32\test.html.
ERROR: No body file. HTML output will not be created.

If I want to use a relative path, I need to change the current working directory. Fortunately, there's a simple way to do that.

Change the current directory in SAS

Use the DLGCDIR function to change the working directory for your current SAS session. The DLGCDIR function was added in SAS 9.4 Maintenance 4 (released in 2016). Now I can change the working directory early in my program, and all relative file paths will resolve accordingly.

/* working path for my projects */
%let rc = %sysfunc(dlgcdir('u:/projects'));
 
ods html file="./test.html";
proc print data=sashelp.class; run;
ods html close;

I can use my account-specific environment variables to make these paths work for all users. For example, on Windows I can reference the USERPROFILE environment variable. (On Unix, I can use the HOME environment variable instead.)

/* working path for my projects */
%let user = %sysget(USERPROFILE);
%let rc = %sysfunc(dlgcdir("&user./Documents"));
 
/* create an output data folder if needed */
options dlcreatedir;
libname outdata "./data";
 
ods html file="./test.html";
data outdata.class;
 set sashelp.class;
run;
proc print data=outdata.class; run;
ods html close;

Here's my log output. Notice how the HTML file and the output data folder are both created at locations relative to my home directory.

25         /* working path for my projects */
26         %let user = %sysget(USERPROFILE);
27         %let rc = %sysfunc(dlgcdir("&user./Documents"));
NOTE: The current working directory is now "C:\Users\sascrh\Documents".
28         
29         options dlcreatedir;
30         libname outdata "./data";
NOTE: Library OUTDATA was created.
NOTE: Libref OUTDATA was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: C:\Users\sascrh\Documents\data
31         
32         ods html file="./test.html";
NOTE: Writing HTML Body file: ./test.html
33         data outdata.class;
34          set sashelp.class;
35         run;

If using SAS Enterprise Guide, you can add DLGCDIR function steps to the startup statements that run when you connect to SAS, ensuring that your working directory starts in a valid location for SAS output. You can specify those statements in Tools->Options->SAS Programs, "Submit SAS code when server is connected." A SAS administrator can also add code to the AUTOEXEC file that runs when the SAS session begins, thus helping to manage this for larger groups of SAS users.

See also

The post Manage the current directory within your SAS program appeared first on The SAS Dummy.

Essential SAS tools to bring to your next hackathon

$
0
0

To succeed in any data-focused hackathon, you need a robust set of tools and skills – as well as a can-do attitude.  Here's what you can expect from any hackathon:

  • Messy data.   It might come from a variety of sources, and won't necessarily be organized for analytics or reporting.  That's your job.
  • Nebulous problem set. Usually the goal of a hackathon is to generate insights, improve a situation, or optimize a process. But you don't know going into it which insights you need, which process is ripe for optimization, or which situations can be improved by using data.  Hackathons are as much about discovering opportunities as they are about solving problems.
  • Team members with different viewpoints. This is a big strength of hackathons, and it can also present the biggest challenge.  Team members bring different skills and ideas.  To be successful, you need to be open to those ideas and to allowing team members to contribute in the way that best uses their skills.  Think of yourselves as the Oceans Eleven of data analytics.

In my experience, hackathons are often a great melting pot of different tools and technologies.  Whatever tech biases you might have in your day job (Windows versus Linux, SAS versus Python, JSON versus CSV) – these melt away when your teammates show up ready to contribute to a common goal using the tools that they each know best.

My favorite hackathon tools

At the Analytics Experience 2018 Hackathon, attendees have the entire suite of SAS tools available.  From Base SAS, to SAS Enterprise Guide, to SAS Studio, to SAS Enterprise Miner and the entire SAS Viya framework -- including SAS Visual Analytics, SAS Visual Text Analytics, SAS Data Mining and Machine Learning.  As we say here in San Diego, it's the whole enchilada.  As the facilitators were presenting the whirlwind tour of all of these goodies, I could see the attendees salivating.  Or maybe that was just me.

When it comes to getting my hands dirty with unknown data, my favorite path begins with SAS Enterprise Guide.  If you know me, this won't surprise you.  Here's why I like it.

Import Data task: Import any data

Hackathon data almost always comes as CSV or Excel spreadsheets.  The Import Data task can ingest CSV, fixed-width text, and Excel spreadsheets of any version.  Of course most "hackers" worth their salt can write code to read these file types, but the Import Data task helps you to discover what's in the file almost instantly.  You can review all of the field names and types, tweak them as you like, and click Finish to produce a data set.  There's no faster method of turning raw data into a SAS data set that feeds the next step.

See Tricks for importing text files and Importing Excel files using SAS Enterprise Guide for more details about the ins-and-outs of this task.  If you want to ultimately turn this step into repeatable code (a great idea for hackathons), then it's important to know how this task works.

Note: if your data is coming from a web service or API, then it's probably in JSON format.  There's no point-and-click task to read that, but a couple of SAS program lines will do the trick.

Query Builder: Filter, compute, summarize, and join

The Query Builder in SAS Enterprise Guide is a one-stop shop for data management.  Use this for quick filtering, data cleansing, simple recoding, and summarizing across groups. Later, when you have multiple data sources, the Query Builder provides simple methods to join these – merge on the fly.

Before heading into your next hackathon, it's worth exploring and practicing your skills with the Query Builder.  It can do so much -- but some of the functions are a bit hidden.  Limber up before you hack!

See this paper by Jennifer First-Kluge for an in-depth tour of the tool.

Characterize Data: Quick data characteristics, with ability to dive deeper

If you've never seen your data before, you'll appreciate this one-click method to report on variable types, frequencies, distinct values, and distributions.  The Describe->Characterize Data task provides a good start.

Using SAS Studio? There's a Characterize Data task in there as well.  See Marje Fecht's paper: Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide for more about this and other tasks.

Data tasks: Advanced data reworking: long to wide, wide to long

"Long" data is typically best for reporting, while "wide" data is more suited for analytics and modeling  The process of restructuring data from long to wide (or wide to long) is called Transpose.  SAS Enterprise Guide has special tasks called "Split Data" (for making wide tables) and "Stack Data" (for making long data).  Each method has some special requirements for a successful transformation, so it's worth your time to practice with these tasks before you need them.

Program Editor: Flexible coding environment

The program editor in SAS Enterprise Guide is my favorite place to write and modify SAS code.  Here are my favorite tricks for staying productive in this environment including code formatting, shown below.

autoformat code

Have another favorite editor?  You can use SAS Enterprise Guide to open your code in your default Windows editor too.  That's a great option when you need to do super-fancy text manipulation.  (We won't go into the "best programming editor" debate here, but I've got my defaults set up for Notepad++.)

Export and share with others

The hackathon "units of sharing" are code (of course) and data.  SAS Enterprise Guide provides several simple methods to share data in a way that just about any other tool can consume:

  • Export data as CSV (CSV is the lingua franca of data sharing)
  • Export data as Excel (if that's what your teammates are using)
  • Send to Excel -- actually my favorite way to generate ad-hoc Excel data, as it automates Microsoft Excel and pipes the data your looking at directly into a new sheet.
  • Copy / paste with headers -- low-tech, but this gets you exactly the columns and fields that you want to share with another team member.

When it comes to sharing code, you can use File->Export All Code to capture all SAS code from your project or process flow.  However, I prefer to assemble my own "standalone" code piecemeal, so that I can make sure it's going to run the same for someone else as it does for me.  To accomplish this, I create a new SAS program node and copy the code for each step that I want to share into it...one after another.  Then I test by running that code in a new SAS session.  Validating your code in this way helps to reduce friction when you're sharing your work with others.

Hacking your own personal growth

The obvious benefit of hackathons is that at the end of a short, intense period of work, you have new insights and solutions that didn't have before – and might never have arrived at on your own.  But the personal benefit comes in the people you meet and the techniques that you learn.  I find that I'm able to approach my day job with fresh perspective and ideas – the creativity keeps flowing, and I'm energized to apply what I've learned in my business.

The post Essential SAS tools to bring to your next hackathon appeared first on The SAS Dummy.

How to read multiple text files in SAS

$
0
0

As part of my research for a different article, I recently collected data about my driving commute home via an accelerometer recorder app on my phone. The app generates a simple TSV file. (A TSV file is like a CSV file, but instead of a comma separator, it uses a TAB character to separate the values.) The raw data looks like this:

Related from Analytics Experience 2018: Using your smartphone accelerometer to build a safe driving profile

With SAS, it's simple to import the file into a data set. Here's my DATA step code that uses the INFILE statement to identify the file and how to read it. Note that the DLM= option references the hexadecimal value for the TAB character in ASCII (09x), the delimiter for fields in this data.

data drive;
  infile "/home/chris.hemedinger/tsv/drivehome.tsv" 
   dlm='09'x;
  length counter 8 
         timestamp 8 
         x 8 y 8 z 8 
         filename $ 25;
  input counter timestamp x y z filename;
run;

In my research, I didn't stop with just my drive home. In addition to my commute, I collected data about 4 other activities, and thus accumulated a collection of TSV files. Here's my file directory in my SAS OnDemand for Academics account:

To import each of these data files into SAS, I could simply copy and paste my code 4 times and then replace the name of the file for each case that I collected. After all, copy-and-paste is a tried and true method for writing large volumes of code. But as the number of code lines grows, so does the maintenance work. If I want to add any additional logic into my DATA step, that change would need to be applied 5 times. And if I later come back and add more files to my TSV collection, I'll need to copy-and-paste the same code blocks for my additional cases.

Using a wildcard on the INFILE statement

I can read all of my TSV files in a single step by using the wildcard notation in my INFILE statement. I replaced the "drivehome.tsv" filename with *.tsv, which tells SAS to match on all of the TSV files in the folder and process each of them in turn. I also changed the name of the data set from "drive" to the more generic "accel".

data accel;
  infile "/home/chris.hemedinger/tsv/*.tsv" 
   dlm='09'x;
  length counter 8 
         timestamp 8 
         x 8 y 8 z 8 
         filename $ 25;
  input counter timestamp x y z filename;
run;

The SAS log shows which files have been processed and added into my data set.

With a single data set that has all of my accelerometer readings, I can easily segment these with a WHERE clause in later processing. It's convenient that my accelerometer app also captured the name of each TSV file so that I can keep these cases distinct. A quick PROC FREQ shows the allocation of records for each case that I collected.

Add the filename into the data set

If your input data files don't contain an eponymous field name, then you will need to use a different method to keep track of which records come from which files. The DATA step INFILE statement supports a special option for this -- it's the FILENAME= option. The FILENAME= option allows you to specify an automatic variable to store the name of the current input file. Like all automatic variables in the DATA step, this variable won't be written to the output data set by default. So, you need to include code that assigns the value to a permanent variable that you can save. Here's my code, refactored a little bit, now with the options for reading/storing the file names we're processing:

filename tsvs "/home/chris.hemedinger/tsv/*.tsv";
data accel;
  length casefile $ 100 /* to write to data set */
     counter 8 
     timestamp 8 
     x 8 y 8 z 8 
     filename $ 25	       
     tsvfile $ 100 /* to hold the value */	       
   ;
 
   /* store the name of the current infile */       
   infile tsvs filename=tsvfile 
    dlm='09'x ; 
  casefile=tsvfile;
  input counter timestamp x y z filename;	
run;

In the output, you'll notice that we now have the fully qualified file name that SAS processed using INFILE.

Managing data files: fewer files is better

Because we started this task with 5 distinct input files, it might be tempting to store the records in separate tables: one for each accelerometer case. While there might be good reasons to do that for some types of data, I believe that we have more flexibility when we keep all of these records together in a single data set. (But if you must split a single data set into many, here's a method to do it.)

In this single data set, we still have the information that keeps the records distinct (the name of the original files), so we haven't lost anything. SAS procedures support CLASS and BY statements that allow us to simplify our code when reporting across different groups of data. We'll have fewer blocks of repetitive code, and we can accomplish more across all of these cases before we have to resort to SAS macro logic to repeat operations for each file.

As a simple example, I can create a simple visualization with a single PROC SGPANEL step.

ods graphics / width=1600 height=400;
proc sgpanel data=accel;
 panelby filename / columns=5 noheader;
 series x=counter y=x;
 series x=counter y=y;
 series x=counter y=z;
 colaxis display=none minor;
 rowaxis label="m/s**2" grid;
 where counter<11000;
run;

Take a look at these 5 series plots. Using just what you know of the file names and these plots, can you guess which panel represents which accelerometer case?

Leave your guess in the comments section. I'll explore these data further in a future blog post!

The post How to read multiple text files in SAS appeared first on The SAS Dummy.

Reporting on accelerometer data with SAS Visual Analytics

$
0
0

A few weeks ago I posted a cliffhanger-of-a-blog-post. I left my readers in suspense about which of my physical activities are represented in different sets of accelerometer data that I captured. In the absence of more details from me, the internet fan theories have been going wild. Well, it's time for the big reveal! I've created a SAS Visual Analytics report that shows each of these activity streams with the proper label:

Accelerometer measurements per activity -- click to enlarge!

Were your guesses confirmed? Any surprises? Were you more impressed with my safe driving or with my reckless behavior on the trampoline?

Collecting and preparing accelerometer data

You might remember that this entire experiment was inspired by a presentation from Analytics Experience 2018. That's when I learned about an insurance company that built a smartphone app to collect data about driving behavior, and that the app relies heavily on accelerometer readings. I didn't have time or expertise to build my own version of such an app, but I found that there are several good free apps that can collect and export this data. I used an app called AccDataRec on my Android phone.

Each "recording session" generates a TSV file -- a tab-separated file that contains a timestamp and a measurement for each of the accelerometer axes (X, Y, and Z). In my previous post, I shared tips about how to import multiple TSV files in a single step. Here's the final version of the program that I wrote to import these data:

filename tsvs "./accel/*.tsv";
libname out "./accel";
 
data out.accel;
  length 
    casefile $ 100 /* to write to data set */
    counter 8 
    timestamp 8 
    timestamp_sec 8
    x 8 y 8 z 8 
    filename $ 25        
    tsvfile $ 100 /* to hold the value */
  ;
  format timestamp datetime22.3 timestamp_sec datetime20.;
 
  /* store the name of the current infile */
  infile tsvs filename=tsvfile expandtabs;
  casefile=tsvfile;
  input counter timestamp x y z filename;
 
  /* convert epoch time into SAS time */
  timestamp=dhms('01jan1970'd, 0, 0, timestamp / 1000);
 
  /* create a timestamp with the precision of one second */
  timestamp_sec = intnx('second',timestamp,0);
run;

Some notes:

  • I converted the timestamp value from the data file (an epoch time value) to a native SAS datetime value by using this trick.
  • Following advice from readers on my last post, I changed the DLM= option to a more simple EXPANDTABS option on the INFILE statement.
  • Some of the SAS time-series analysis doesn't like the more-precise timestamp values with fractions of seconds. I computed a less precise field, rounding down to the second, just in case.
  • For my reports in this post, I really need only 5 fields: counter (the ordinal sequence of measurements), x, y, z, and the filename (mapping to activity).

The new integrated SAS Viya environment makes it simple to move from one task to another, without needing to understand the SAS product boundaries. I used the Manage Data function (that's SAS Data Management, but does that matter?) to upload the ACCEL data set and make it available for use in my reports. Here's a preview:

Creating a SAS Visual Analytics report

With the data now available and loaded into memory, I jumped to the Explore and Visualize Data activity. This is where I can use my data to create a new SAS Visual Analytics report.

At first, I was tempted to create a Time Series Plot. My data does contain time values, and I want to examine the progression of my measurements over time. However, I found the options of the Time Series Plot to be too constraining for my task, and it turns out that for this task the actual time values really aren't that important. What's important is the sequence of the measurements I've collected, and that's captured as an ordinal in the counter value. So, I selected the Line Plot instead. This allowed for more options in the categorical views -- including a lattice row arrangement that made it easy to see the different activity patterns at a glance. This screen capture shows the Role assignments that I selected for the plot.

Adding a closer view at each activity

With the overview Line Plot complete, it's time to add another view that allows us to see just a single activity and provide a close-up view of its pattern. I added a second page to my report and dropped another Line Plot onto the canvas. I assigned "counter" to the category and the x, y, and z values to the Measures. But instead of adding a Lattice Row value, I added a Button Bar to the top of the canvas. My idea is to use the Button Bar -- which is good for navigating among a small number of values -- as a way to trigger a filter for the accelerometer data.

I assigned "filename" to the Category value in the Button Bar role pane. Then I used the Button Bar options menu (the vertical dots on the right) to add a New filter from selection, selecting "Include only selection".

With this Button Bar control and its filter in place, I can now switch among the data values for the different activities. Here's my "drive home" data -- it looks sort of exciting, but I can promise you that it was a nice, boring ride home through typical Raleigh traffic.

Phone mounted in my car for the drive home

The readings from the "kitchen table" activity surprised me at first. This activity was simply 5 minutes of my phone lying flat on my kitchen table. I expected all readings to hover around zero, but the z axis showed a relatively flat line closer to 10 meters-per-second-per-second. Then I remembered: gravity. This sensor registers Earth's gravity, which we are taught is 9.8 meters-per-second-per-second. The readings from my phone hovered around 9.6 -- maybe my house is in a special low-gravity zone, or the readings are a bit off.

Phone at rest on my kitchen table

Finally, let's take a closer look at my trampoline workout. Since I was holding my phone upright, it looks like the x-axis felt the brunt of the acceleration forces. According to these readings, my phone was subjected to a g-force of 7 or 8 times that of Earth's gravity -- but just for a split second. And since my phone was in my hand and my arm was flailing around (I am not a graceful rebounder), my phone was probably experiencing more force than my body was.

Bounding on the trampoline as high as I can

Some love for the Windows 10 app

My favorite method to view SAS Visual Analytics reports is through the SAS Visual Analytics application that's available for Windows 10 and Windows mobile devices. Even on my desktop, where I have a full web browser to help me, I like the look and feel of the specialized Windows 10 app. The report screen captures for this article were rendered in the Windows 10 app. Check out this article for more information about the app. You can try the app for free, even without your own SAS Viya environment. The app is hardwired with a connection to the SAS demo reports at SAS.com.

See also

This is the third (and probably final) article in my series about accelerometer data. See these previous posts for more of the fun background information:

The post Reporting on accelerometer data with SAS Visual Analytics appeared first on The SAS Dummy.

Create newline-delimited JSON (or JSONL) with SAS

$
0
0

JSON is a popular format for data exchange between APIs and some modern databases. It's also used as a way to archive audit logs in many systems. Because JSON can represent hierarchical relationships of data fields, many people consider it to be superior to the CSV format -- although it's certainly not yet universal.

I learned recently that newline-delimited JSON, also called JSONL or JSON Lines, is growing in popularity. In a JSONL file, each line of text represents a valid JSON object -- building up to a series of records. But there is no hierarchical relationship among these lines, so when taken as a whole the JSONL file is not valid JSON. That is, a JSON parser can process each line individually, but it cannot process the file all at once.

In SAS, you can use PROC JSON to create valid JSON files. And you can use the JSON libname engine to parse valid JSON files. But neither of these can create or parse JSONL files directly. Here's a simple example of a JSONL file. Each line, enclosed in braces, represents valid JSON. But if you paste the entire body into a validation tool like JSONLint, the parsing fails.

JSONL sample, not valid in JSON parsers

If we needed these records to be true JSON, we need a hierarchy. This requires us to set off the rows with more braces and brackets and separate them with commas, like this:

Valid JSON with hierarchies

Creating JSONL with PROC JSON and DATA step

In a recent SAS Support Communities thread, a SAS user was struggling to use PROC JSON and a SAS data set to create a JSONL file for use with the Amazon Redshift database. PROC JSON can't create the finished file directly, but we can use PROC JSON to create the individual JSON object records. Our solution looks like this:

  • Use PROC JSON to read each record of the source data set and create a new JSON file. DATA step and CALL EXECUTE can generate these steps for us.
  • Using DATA step, post-process the collection of JSON files and append these into a final JSONL file.

Here's the code we used. You need to change only the output file name and the source SAS data set.

/* Build a JSONL (newline-delimited JSON) file */
/* from the records in a SAS data set          */
filename final "c:\temp\final.jsonl" ;
%let datasource = sashelp.class;
 
/* Create a new subfolder in WORK to hold */
/* temp JSON files, avoiding conflicts    */
options dlcreatedir;
%let workpath = %sysfunc(getoption(WORK))/json;
libname json "&workpath.";
libname json clear;
 
/* Will create a run a separate PROC JSON step */
/* for each record.  This might take a while   */
/* for very large data.                        */
/* Each iteration will create a new JSON file  */
data _null_;
 set &datasource.;
 call execute(catt('filename out "',"&workpath./out",_n_,'.json";'));
 call execute('proc json out=out nosastags ;');
 call execute("export &datasource.(obs="||_n_||" firstobs="||_n_||");");
 call execute('run;');
run;
 
/* This will concatenate the collection of JSON files */
/* into a single JSONL file                           */
data _null_;
 file final encoding='utf-8' termstr=cr;
 infile "&workpath./out*.json";
 input;
 /* trim the start and end [ ] characters */
 final = substr(_infile_,2,length(_infile_)-2);
 put final;
run;

From what I've read, it's a common practice to compress JSONL files with gzip for storage or faster transfers. That's a simple step to apply in our example, because SAS supports a GZIP method in SAS 9.4 Maintenance 5. To create a gzipped final result, change the first FILENAME statement to something like:

filename final ZIP "c:\temp\final.jsonl.gz" GZIP;

The JSONL format is new to me and I haven't needed to use it in any of my applications. If you use JSONL in your work, I'd love to hear your feedback about whether this approach would create the types of files you need.

The post Create newline-delimited JSON (or JSONL) with SAS appeared first on The SAS Dummy.

Viewing all 234 articles
Browse latest View live